It happens that RTL synthesis perform some of the steps realized by ... it is imperative to give the designer the possibility to explore all possible synthesis flows.
Overlap and frontiers between behavioral and RTL synthesis Wander O. Cesário†, Zoltan Sugar†, Rodolphe Suescun‡ and Ahmed A. Jerraya† Abstract Scheduling, resource allocation and binding are traditionally classified as behavioral synthesis tasks. However, advanced RTL synthesis tools could execute the last two tasks. Hence, an overlap of functionality will be found in most new design environments. In this paper, we present a new design flow with flexible frontiers between behavioral and RTL synthesis tools. Our results show that it is worth to give the designer this extended degree of freedom, since the best solution is not always the result of a specific design flow.
1. Introduction Behavioral and Register-Transfer Level (RTL) synthesis work on different levels of design abstraction. With synchronous digital systems, the essential differences between these levels of abstraction are the synchronization points used [1]. At the behavioral level of abstraction, the design is described as an algorithm. Synchronization points are the interactions of the system with the external environment (I/O operations). In VHDL, this corresponds to Wait statements, in StateChart, to a state defined by the language. Scheduling transforms the initial model into a FSM when each transition is a control step. Generally, a transition will take a single clock cycle to execute. This model is called FSMD [2]. Functional unit allocation/binding steps link the operations (e.g. arithmetic operations) to components from the library. Interconnect allocation generates muxes and buses in order to form a datapath. The final result is the well known controller/datapath architecture used by most behavioral synthesis tools. It happens that RTL synthesis perform some of the steps realized by behavioral synthesis. In the RTL abstraction level, synchronization is explicitly done by one (or several) clock signal(s). The code executed during a transition is made of basic computations or register transfers. For instance, it cannot include data dependent loops. Therefore, clock cycle accurate timing information is fundamental for RTL synthesis. The precision of the design model produced by behavioral synthesis depends on the tasks accomplished. Each synthesis step refines the initial model. Executing all these tasks will produce a controller/datapath architecture. In order to profit from this overlap of functionality, current behavioral synthesis design flows must be changed. It should be possible to interface with RTL synthesis at any stage of the synthesis flow. This represents an extra degree of liberty to the synthesis process since a constrained architectural model is not imposed as the single possible interface between behavioral and RTL synthesis tools. As the quality of the synthesis’s result depends not only on the quality of the tools used but also on the design itself, it is imperative to give the designer the possibility to explore all possible synthesis flows. We propose a flexible synthesis flow where frontiers between behavioral synthesis and RTL synthesis are not rigid. This possibility, in conjunction with the manipulation of design constraints, gives the designer much more control over the whole synthesis process. A flexible synthesis flow allows the designer to adapt the design methodology according to the nature of the design and the quality of synthesis tools. After an overview of previous work, section 3 shows how behavioral synthesis could interface with RTL synthesis at different points of the synthesis flow. Section 4 shows our flexible behavioral synthesis design flow. Section 5 has some results for various benchmarks passing through diverse synthesis flows. Finally, we draw some conclusions in section 6.
† ‡
Laboratory TIMA-CMP, 46 avenue Felix Viallet, 38031 Grenoble France. AREXSYS, 1 Chemin du Pré Carré, 38240 Meylan France.
2. Previous Work Traditional behavioral synthesis flows use a fixed sequence of synthesis tasks. They produce a controller/datapath architecture that is used to interface with RTL synthesis tools. This task flow is usually a sequence of scheduling and resource allocation (in any order) followed by resource binding. Scheduling and resource allocation tasks are interdependent. However, most current design flows execute them as independent tasks. Some systems [10] resolve scheduling, resource allocation and binding as the same optimization problem; but complexity restrains the practical applicability of this approach. There is no pre-defined order for doing scheduling and resource allocation. As said before, it is only after scheduling that it is possible to obtain a clock cycle accurate model for the design. In this paper, we assume that scheduling is executed first. The underlying models commonly used as design representations are Control, Data, or Control-Data Flow Graphs (CFG, DFG, CDFG) for the behavioral input description and FSMD for the synthesis result and for some refinement steps. The behavioral FSMD is represented as a transition table. Each transition may be executed in a single clock cycle. Operations associated with each transition of this description may be executed in parallel. For each transition, the slowest operation will define its effective delay. Here, it is assumed that all operations could be executed in one clock cycle. We will discuss later how to accommodate other types of operations in this model. Each controller transition is defined by the current state, the condition to be satisfied and a set of operations or actions. The condition evaluated true will determine the transition to be done and thus the actions to be executed. MEBS [3] synthesis flow is able to generate VHDL descriptions after each behavioral synthesis task. However, they could not be used for synthesis but only for simulation. Interactive Synthesis Environment (ISE) [4] has a flexible design flow since it allows performing behavioral synthesis tasks in any order. Nevertheless, it is also an integrated system where RTL synthesis and some floor planning are all done in parallel. Interface with other synthesis tools is only possible at a low abstraction level. HYPER optimizations could be controlled by a design guidance system [5] but it is not applied to the sequence of synthesis tasks. ADAM Design Planning System [6] could found a good sequence of design tasks for a given set of specifications. Still, it does not have a flexible interface with RTL synthesis. We propose a flexible behavioral synthesis flow in conjunction with flexible RTL synthesis interface. In our behavioral synthesis system, designer interaction is not only restricted to give design constraints. It is possible to define the design flow and to select the set of tools to execute the synthesis tasks. For instance, when estimating design parameters, it is possible to choose between fast behavioral level estimators [7] and precise but slow RTL estimators. Hence, effective design space exploration could be achieved and the usability of behavioral synthesis is increased [8].
3. Behavioral and RTL Synthesis Overlap Synthesis tools handle the design at different abstraction levels, ranging from system level to physical level. Design models and synchronization points are different for each abstraction level. In this section, we will study the design models used during the synthesis flow starting from a system level down to the RTL level.
3.1. Abstraction Levels for Synthesis As we go from the highest to the lowest abstraction level, diverse synchronization points and design models are used, as shown in Table 1. On the system level, we work with communicating processes that synchronize through message exchange. After partitioning [9], each process could be represented at the algorithmic level by a control/data flow graph that synchronizes through I/O events. This may also be represented using a finite state machine with datapath (FSMD) model. Behavioral synthesis takes the flow graph and produces an RTL model. This model is generally represented as a controller/datapath architecture. At the RT-level, data transfers are synchronized at clock cycle boundaries. RTL and Logic synthesis will map controller and datapaths components to a cell library to produce a gate netlist. Finally, layout synthesis will produce the final chip layout. On the physical level, wire value changes define the valid data.
Abstraction level System level
Synchronization points Inter-process messages
Input design model Communicating process CFG, DFG, I/O events Algorithmic level CDFG, FSMD RT-level Clock FSM, BDD, Boolean equations Physical level Wire value change Gate netlist and Layout models Table 1 - Synchronization and design models on different abstraction levels
3.2. Scope of Behavioral and RTL Synthesis
Synchronization points
Input design model
Scheduling
I/O events
CFG
Resource allocation
Clock
FSMD
Resource binding
Clock
FSMD with resources
Logic synthesis
Clock
FSM + DP
Layout synthesis
Wire value change
Gate netlist
RTL synthesis
Synthesis task
Behavioral synthesis
Table 2 details the different synthesis tasks executed by behavioral and RTL synthesis tools. Design models are also detailed for each task. This table shows clearly that there could be a functionality overlap (on gray) between behavioral and RTL synthesis tools. Design input models for RTL synthesis goes from cycle true FSMDs to completely specified architectures. All these models have something in common: they synchronize at clock cycle boundaries, i.e., are clock cycle accurate. This level of precision is only attained after behavioral synthesis’ scheduling task. The clock cycle accurate FSMD design model is the key for behavioral and RTL synthesis integration. It Table 2 that this model could be used for doing resource allocation and binding in the region can be seen in of functionality overlap.
Table 2 - Scope of behavioral and RTL synthesis Resource allocation and binding produce a FSMD with resources. Storage allocation and binding assign registers/memories to variables and arrays. Functional unit allocation assigns operators to operations. Interconnection allocation defines paths between storage and computation cells. Finally, a controller/datapath architecture is created. Information about resources might not be exhaustive, since we could be possibly interested in doing resource allocation and binding in two stages. For instance, complex operations could be treated by behavioral synthesis while the simple ones are transferred to RTL synthesis. In this case, we must be able to translate a FSMD with partial resource information in a format acceptable by RTL synthesis. More details on this flexible interface between behavioral and RTL synthesis will be given in the next section. Complex operations, i.e., the ones that need a data-dependent number of clock cycles to execute and multicycle operations are not allowed in the FSMD clock cycle accurate design model. There are two possible solutions to deal with this problem. The first solution is to consider complex operations as a procedure call and associate them to external functional units. Procedure calls will be used to start these external functional units and get their results [11]. Each procedure call must take only one cycle. In this case, procedure call needs to be handled by the scheduler. The second solution is to describe complex operations by procedures that use only simple operations. Then procedure calls could be expanded inline [12] and scheduled with the rest of the description [13].
4. Flexible Synthesis Flow This section deals with MUSIC, a flexible synthesis environment (see Figure 1). The first task in MUSIC synthesis flow is VHDL compilation (task A, see Figure 1). It generates a CFG that is used by control-flow scheduling (CFS, task B). Scheduling produces a clock cycle accurate design model. Generally, behavioral synthesis tools assume a certain design style: control-flow or data-flow oriented. Each design style asks for a specialized kind of scheduling tool: control-flow based and data-flow based, respectively. MUSIC accepts
mixed style designs; first CFS is applied to the entire CFG. This results in a FSMD where each transition may hide complex computations that may be represented using a DFG. When needed, a data-flow scheduling (DFS, task D) is used on each transition. Resource allocation and binding deal with functional units (task C) and memory and interconnections (task E). Diverse design flows or synthesis paths are possible with MUSIC (see Figure 1). Each path going from node A to node F constitutes a possible synthesis flow. If there are no multi-cycle/complex operations to share resources, FU allocation could be skipped. On some control-flow dominated designs it is possible to go to RTL synthesis just after CFS, skipping the rest of the behavioral synthesis flow. On data-flow oriented designs, DFS could be done before or after FU allocation. On some control-flow oriented designs, DFS could be not necessary. Behavioral VHDL compilation
Control-flow scheduling
Data-flow scheduling
Behavioral Synthesis
$
FU Allocation/binding
%
Memory and interconnection allocation/binding
'
RTL synthesis
&
(
)
Figure 1 – MUSIC Flexible Behavioral Synthesis Design Flow In the next section, we will try few different synthesis flows allowed in MUSIC system on a set of high level synthesis benchmarks [14].
5. Results MUSIC is able to treat designs that mix control and data flow styles, so benchmarks were chosen to represent both styles. Greatest common divisor (GCD) is a little control-flow oriented design. Fixed-point unit (FPU) is capable of doing addition, subtraction, multiplication and division; it could be classified as a mixed style design. QRS is a medical application, it could be used in a heart rate monitor; it is a control-flow oriented design. The fifth order elliptical wave filter (ELLIPF) is a well-known data-flow benchmark. Table 3 summarizes the results for this set of benchmarks using MUSIC behavioral synthesis. Task F was accomplished by a commercial RTL synthesis tool to produced a gate netlist. The values presented in this table come from RTL estimators for increased accuracy. The number of operations (Op #) and number of FSMD states give an idea of design’s complexity. Dynamic power values assume a 10 MHz circuit’s clock. Design Synthesis path Op # FSMD state # GCD GCD FPU FPU FPU QRS QRS ELLIPF ELLIPF ELLIPF
SP1 SP2 SP1 SP2 SP3 SP1 SP2 SP1 SP2 SP3
2 2 14 14 14 32 32 34 34 34
4 4 16 16 34 75 75 14 14 17
Critical path Dyn. Power (mW) (ns) 1319 40.58 0.86 1200 43.11 0.74 10147 103.5 5.79 11473 119.1 4.14 14855 81.21 5.66 7302 49.07 3.48 7166 44.04 2.86 6659 29.00 6.02 7280 31.83 7.51 5865 35.36 5.34 Area
Table 3 - Synthesis results for different design flows
For each example, we applied several synthesis paths (SPs). For this set of designs, we selected three design flows. The first (SP1) performs only CFS (path ABF) and produces a VHDL description that can be handled directly by RTL synthesis. In the second flow (SP2), resource allocation /binding is done just after CFS (path ABCEF). The third flow (SP3) includes DFS (path ABDCEF) in the synthesis path. DFS could be skipped on GCD and QRS because they are control flow oriented, path SP3 give the same results that path SP2. In the case of GCD, the best timing is obtained with path SP1 while dynamic power and area are smaller with path SP2. Behavioral-level resource allocation/binding leads to the best solution for QRS (path SP2). Synthesis path SP1 gives the solution with the smallest area for the FPU design. Using path SP2, we got some improvements on power consumption. Finally, DFS (path SP3) allows reducing the depth of operator’s chaining and improves timing. Of course, this solution uses more area and number of states more than doubled. ELLIPF is a pure data flow oriented application. The fastest solution is obtained with synthesis path SP1. Using DFS (path SP3), we are able to produce a solution with one multiplier and two adders while the other two solutions have two multipliers and six adders. This solution is better in terms of area and power consumption. It is clear that the results of the synthesis flow depend on the application under synthesis. In the case of our benchmarks, none of the experimented flows gives the best results for all the applications. Each flow showed to be efficient for some applications. Some optimizations made by RTL synthesis are not possible when it receives controller/datapath hardware structures.
6. Conclusion In this paper, we explored the overlap of functionality between behavioral and RTL synthesis tools. We demonstrated that there are several design flows for combining behavioral and RTL synthesis. Our results show that the design flow that leads to the best solution is not always the same. It depends not only on the quality of the synthesis tools used and the nature of the design itself, but also on the quality of the interface between behavioral and RTL synthesis tools. It is clear that it is difficult to have a single synthesis path optimized for all kind of application. Nevertheless, a system with flexible synthesis flow has the huge advantage of being easily adaptable to a large collection of applications.
7. Bibliography [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]
D. Ku and G. De Micheli, “High-Level Synthesis of ASICs under Timing and Synchronization Constraints”, Kluwer Academic Publishers, 1992. D. Gajski, N. Dutt, A. Wu, and Y. Lin, High-Level Synthesis: Introduction to Chip and System Design, Kluwer Academic Publishers, Boston, Massachusetts, 1992. F. Tsai and Y. Hsu, "STAR: a system for hardware allocation in data path synthesis", IEEE Trans. CAD, p 10531064, Sept. 1992. D. Gajski, T. Ishii, V. Chaiyakul, H. Juan and T. Hadley, “Interactive Behavioral synthesis”, SASIMI, 1996. L. Guerra, M. Potkonjak and J. Rabaey, “A Methodology for Guided Behavioral-Level Optimization”, Design Automation Conference, p.309-314, 1998. D. Knapp, A. Parker, “The ADAM design planning engine”, IEEE Trans. on Computer-Aided Design, vol. 10, no. 7, 829-46, 1991. F. Kurdahi, N. Dutt and S. Ohm, “A Unified Lower Bound Estimation Technique for High-Level Synthesis”, IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, May 1997. S. Krolikoski, “Considerations on the Usability of Behavioral Synthesis”, Design, Automation and Test in Europe, 1998. R. K. Gupta and G. De Micheli, ``Partitioning of Functional Models of Synchronous Digital Systems'', IEEE Int. Conference on Computer Aided Design (ICCAD), Nov. 1990. B. Landwehr, P. Marwedel, and R. Dömer, “OSCAR:Optimum Simultaneous Scheduling, Allocation and Resource Binding Based on Integer Programming”, Proc. of European Conference on Design Automation, 1994. R. Camposano, “Structural Synthesis in the Yorktown Silicon Compiler”, VLSI, 1987. F. Vahid, “Procedure Exlining: A Transformation for Improved System and Behavioral Synthesis”, International Symposium on System Synthesis, 1995. T. Ly, D. Knapp, R. Miller and D. MacMillen, “Scheduling using Behavioral Templates”, Design Automation Conference, p.101-106, 1995. P. Panda and N. Dutt, “1995 High Level Synthesis Design Repository”, International Symposium on System Synthesis, 1995.