Document not found! Please try again

Midas: Using data-transfers in high-level synthesis 1 ... - CiteSeerX

8 downloads 0 Views 183KB Size Report
2] A. C. Parker, J. T. Pizarro, and M. Mlinar, \MAHA: A Program for Datapath Synthesis,". 23rd IEEE Design Automation Conference, pp. 461{466, 1986.
Midas: Using data-transfers in high-level synthesis

Shantanu Tarafdar Synopsys Inc. 700 East Middle eld Road Mountain View, CA 94043

Miriam Leeser Dept. of Electrical and Computer Engineering Northeastern University Boston, MA 02115

Abstract Memory accesses and data-transfers are the main bottleneck in system design today. Application speci c integrated circuits can address this issue, but high-level synthesis systems (HLS) { which are computer-aided design tools for automatically generating ASIC architectures { typically treat memories and data-transfers as secondary in synthesis. In this paper, we introduce Midas, an HLS system that uses data-transfers as its primary entity for synthesis. This allows more realistic modelling of data issues during the core HLS steps: scheduling, allocation, and binding. We compare Midas with an HLS system does not use use data-transfers as its basic synthesis entity. Midas's architectures have near-equivalent execution unit characteristics and superior storage and data-transfer subsystem properties.

1 Introduction The nature of integrated circuits being designed today is changing. Advances in fabrication technology are enabling extremely complicated algorithms to be mapped to silicon. Many of these new algorithms require high-throughput, memory intensive systems. Manually designing architectures for these systems is untenable. High level synthesis (HLS) is a branch of computer aided design for application speci c integrated circuits (ASICs) that focuses on automatically generating an 1

architecture for a system given a speci cation of its behavior. In modern day ASICs, storage and interconnect dominate the area. The area of execution units is small in comparison. Most HLS systems available today focus their e ort on optimizing the execution unit, a misplaced priority in view of its small area contribution. These HLS systems usually synthesize the storage architecture and interconnect network in a secondary step after the execution unit has been synthesized. Some systems completely decouple execution unit synthesis from synthesis of the storage and data-transfer subsystem [1, 2, 3]. Others use simple models for these subsystems to account for e ects on them during the synthesis of the execution unit [4, 5, 6]. Some HLS tools are specialized to generate extremely ecient architectures for algorithms that can be described as arrays and multidimensional loops [7, 5]. However, they are limited to applications described in that manner. Researchers in the past have stressed the importance of data-transfers in HLS [8, 9]. The DTmodel is a new method for formulating HLS using data-transfers as the basic entity in synthesis as opposed to the tradition model that uses operations [10, 11]. This paper describes Midas, a highlevel synthesis system built using the DT-model. We begin with an overview of Midas's synthesis

ow followed by a description of the design metrics that Midas uses. The main synthesis steps are DT partitioning, DT scheduling, DT binding, and resource allocation. Each of these is described in turn. We have run Midas on several examples including the elliptic lter and a motion estimation example. Our results show that use of the DT-model can result in architectures with smaller data storage area and fewer buses than those generated by conventional synthesis techniques.

2

2 Overview of Midas Midas performs time-constrained HLS which means it optimizes the ASIC area of the architecture it synthesizes given an upper-bound on the execution time of the behavior. Midas's hardware model is a network of functional units and multi-ported register les connected by buses. Each functional unit terminal and register le port is multiplexed over the buses from which it can receive inputs. Figure 1 illustrates this hardware model. DATAPATH

AG1

AG3

AG2

MPRF1

MPRF2

AG4

MPRF3

Storage Architecture = AGs + MPRFs

MPRF4

C O N T R O L L E R

Data Transfer Subsystem = Muxes + Buses

ADDER

ABS MAG

SUB

MULTIPLIER Execution Unit = FUs

Clock

Key: = Multiplexer

= Bus

AG = Address generator

MPRF = Multiported register file

Figure 1: Midas's hardware model Midas di ers from traditional HLS systems in its use of the data-transfer model [11]. In this model, the basic entity for HLS algorithms is a data transfer (DT) { a set consisting of the operation sourcing a piece of data and all the operations using it. Traditional HLS approaches use operations as their basic entity. The input to Midas is a data ow graph (DFG) representing the behavior and the output is an 3

architecture implementing the behavior. Midas synthesizes the architecture by scheduling, binding and allocating one DT at a time. Input DFG Conversion to DT-model DTFG Beginning of synthesis loop: Evaluate timeframes and necessary design metrics Partition DTs Evaluate all Midas design metrics Evaluate all (DT, cstep) scheduling action costs Schedule minimum cost (DT, cstep) pair Bind DT

No

Are all DTs scheduled? Yes

End of synthesis loop Annotate DFG with design information Output DFG

Figure 2: Midas's synthesis ow Figure 2 shows the synthesis ow through Midas. The main body of the ow is within a synthesis loop. At each iteration of the loop, design metrics are evaluated, DT partitioning is carried out and a single DT is selected, scheduled and bound. Resource allocation happens incrementally with additional resources being allocated if they are necessary or if their introduction bene ts the overall design. In the next few sections, we elaborate on each of these steps.

4

3 Conversion to the DT-domain Midas reads in a data ow graph (DFG) representing the behavior. This is a directed, acyclic graph in which nodes represents operations that are performed in the behavior and edges represent data-dependencies between operations. Midas's very rst action is to convert the DFG into its counterpart in the DT-model, a DT owgraph (DTFG). This involves extracting the DTs from the DFG, determining the scheduling constraints between pairs of DTs, and then constructing a graph in which nodes represent DTs and the various types of edges represent di erent types of scheduling constraints between pairs of DTs. All the remaining synthesis steps in Midas operate on the DTFG.

3.1 Extraction of DTs A DT is a set of operations that relate to the motion of one piece of data from its origin in the architecture to all its destinations. The source operation is the one that produces the data. This may be a functional operation, like an addition or multiplication, or a storage read operation. The rest of the operations are destination operations which may be functional operations or a storage write operation. In an unscheduled DFG, each node with outgoing edges is the source operation of a DT. The destination nodes of each of these outgoing edges are the destination operations of the DT. Note that a node from the DFG may be part of several DTs. When a DT is extracted from the DFG, it is initially a primary DT { one with only functional operations. Figure 3 illustrates the extraction of DTs from a DFG.

5

a

b

c

a

d

n1 n1

n2

n4

n3

n1

c n2

n2

DT2

DT3

n1

n2

n3

n4

n5

d n3

DT1

n4

n5

b

n3 DT4

n5

DT5

DT6

DT7

n4

n5

n6

n6 n6

n6

DT8

DT9

f DT10

f

(a) DFG

(b) Extracted DTs

Figure 3: Extraction of DTs from a DFG

3.2 Scheduling constraints The main scheduling directive of the DT-model is that all the operations of a DT be scheduled in the same cstep [10]. This gives rise to three types of scheduling constraints. Direct precedence constraints

When a destination operation of a DT is the source operation

of another DT, then the two DTs must be scheduled in the same cstep with the rst DT preceding the second within the cstep. The rst DT is the parent of the second and the second is the child of the rst. Intra-partition precedence constraints

Primary DTs whose operations cannot be scheduled

at the same cstep must be partitioned into a set of secondary DT { DTs with storage accesses. Within a partition of a primary DT, the secondary DT containing the source operation must be scheduled at least one cstep prior to the rest of the secondary DTs. Concurrency constraints

When two DTs share a common destination operation, they must be

scheduled in the same cstep. The two DTs are siblings of one another. 6

3.3 Building a DTFG DT1

DT2

DT5

DT3

DT6

DT8

DT4

DT7

DT9

DT10

Figure 4: A DTFG A DT owgraph is constructed by Midas and used as the intermediate representation for the behavior. Nodes of the DTFG represent either primary or secondary DTs. There are three types of edges in the DTFG, each type representing a type of scheduling constraint. Directed edges represent the precedence constraints and undirected edges represent the concurrency constraints. Figure 4 shows a DTFG built from the input DFG in Figure 3.

4 Design Metrics Midas uses design metrics and heuristics to perform time-constrained HLS in the DT-domain.

4.1 DT scheduling timeframes The scheduling timeframe of a DT is the set of csteps in which the DT may be scheduled. This is denoted as an interval in the schedule bound by the soonest (ASAP) and latest (ALAP) possible times at which the DT can be scheduled. Midas computes the timeframes of DTs and their constituent operations using the topology of the DTFG. The ASAP and ALAP times are derived from the following two rules: 7



The ASAP time of a DT is greater than the latest ASAP time of any of its parents or any of the parents of its siblings.



The ALAP time of a DT is less than the earliest ALAP time of any of its children.

Midas also considers combinational delay through DTs and previously scheduled parents, siblings and children DTs while computing the ASAP and ALAP times. Timeframes are used in the computation of other design metrics of Midas.

4.2 Distribution graphs Distribution graphs are used in HLS to re ect the utilization or projected utilization of various resource types in the architecture [12]. A distribution graph for a resource type is a function that maps csteps in the schedule to a real number which re ects the number of resource instances of that type that the architecture requires. In a fully scheduled design, these numbers are all integers, but in a partially scheduled design they need not be. Distribution graphs are used to compute projected ASIC area. Midas maintains a distribution graph for each type of functional unit and one for each of buses, storage registers, storage read ports, and storage write ports. In addition, each allocated register le has three distribution graphs associated with it: one for its size, one for its read ports and the third for its write ports.

4.3 Projected ASIC area Projected ASIC area is the design metric most responsible for guiding Midas's scheduler and binder. It is an estimate of the area that will be required by the ASIC once the architecture has been completely synthesized. The projected ASIC area has three components: execution unit area, storage 8

unit area, and interconnect area. Midas assumes that the maximum value of each distribution graph is the minimum number of resource instances of that type that will be needed by the architecture. 

The execution unit area is the sum of the maximums of the functional unit distribution graphs weighted with the area of the respective functional unit.



The storage unit area is the sum of the projected areas of each register le. Register le area are estimated from its size and the numbers of its ports. The maximums of register les' distribution graphs are used to compute their projected areas. These are then summed.



The interconnect area is the sum of the areas of the buses in the architecture. The maximum of the distribution graph for buses is used as the number of buses in the architecture. The length of each bus is given a worst-case estimate, the sum of the project length and width of the ASIC. The ASIC is assumed to be square and so the projected length and width are the square root of the projected area. Midas uses the sum of the execution unit area and storage unit area to project the dimensions of the ASIC.

The sum of the execution unit area, storage unit area and interconnect area is the projected ASIC area. Midas minimizes this design metric.

4.4 Potential for partitioning Midas tries to minimize the number of DTs being partitioned since this impacts the sizes of the register les in the nal architecture. The potential for partitioning in a DTFG is the number of DTs that must be partitioned and a probability that at least one other DT must be partitioned as well. A DTs timeframe is related to those of its parents, siblings, and children based on the scheduling constraints. If it is possible to meet these constraints, the DT is scheduled without partitioning. 9

Otherwise, the DT must be partitioned. The probability that a DT does not need to be partitioned is the ratio of the number of csteps in its timeframe at which it can be scheduled without violating scheduling constraints and the length of its timeframe. The complement of this is the probability that the DT needs to be partitioned. The potential for partitioning in the DTFG is the number of DTs that must be partitioned and the complement of the product of the probabilities that the remaining DTs need not be partitioned. Midas schedules to minimize the potential for partitioning.

5 DT partitioning 5.1 What is DT partitioning? The prime scheduling directive of the DT-model is that all the operations of a DT must be scheduled in the same cstep. This minimizes the amount of storage required in the architecture but sometimes this may not be possible. CStep period = 5ns add1

3ns DT delay = 4ns (