application specific IC (ASIC) design are becoming ineffective. ...... [22] M.J.S.
Smith, "Application-Specific Integrated Circuits;," Addison ~esley Longman, Inc., ...
Carnegie Mellon
Design Flow for Regular Fabrics
Veerbhan
Kheterpal 2004
Advisor: Prof. Pileggi
Electrical ~ Computer
ENGINEERING
Carnegie Mellon University
[)esign flow for RegularFabrics
Masters Project Report
By VeerbhanKheterpal (veerbhan @cmu.edu) Departmentof Electrical & ComputerEngineering
December 2003
Contents 1
Introduction
3
2
Regular Placement 2.1 CAD Flow ......................................... 2.2 Results ..........................................
6 7 9
3
Regular Routing Models 3.1 Structured Routing .................................... 3.2 VIA Configurable Routing ................................
12 12 13
4
A Regular Routing Framework 4.1 Routing Architecture Description .......................... 4.2 Routing Flow/Algorithm ............................. 4.,3 Experiments & Results ................................
16 16 17 18
5 Conclusions
24
6 Acknowledgements
25
List of Figures 1 2 3 4 5 6 7 8 9 10
LUT based PLB for VPGA Fabric .......................... CAD-flow for an ASIC-.routed regular PLB-array ................. Structured routing on a grid template ....................... Repeating unit for a via con figurable fabric ...................... Structure in Figure 4 with redundant vias ........................ Routing flow ....................................... Performance and Via-count vs. Granularity .................... Firewire controllerRegular Placement,Via Configurable routed ............ Firewire controllerRegular Placement,ASIC routed ................... Area and Performance for various design methodologies ..............
6 8 t3 14 15 17 20 22 23 24
Abstract In an efl~ort to control the parametervariations, rising maskcosts and siicstematic yield problems that are lhreatening the affordability of application-specific ICs, newform’.~of designregularity and structure have been proposed.For example,there has been speculation [101 that regular logic fabrics [191 based on regular geometrypatterns [3] can offer an economicsolution [8] to provide, tighter control of variations and greater control of systematic manufacturingfailures. Anemphasis:is being placed on developingnew, regular logic fabrics that leverage the regularib, and programmabilityof FPGAs,yel deliver a level of performanceand density close to ASICs.Wepresent a complete CADflow thai: uses a combinationof commercialand developedtools to provide regular implementations of designs (RTLs).These: implementationsexhibit high physical regularity in the silicon as well the interconnect layers. Thedevelopedtools include a tech-mapperwhicl-~ exploits regularily in the design, a packingtool and a regular-router. Wefurther proposenewregular routing architectures and explore the various performancevs. manufacturability trade-offs. Result:~ demonstratethat a more regular, restricted architecture can provide a substantial advantagein terms of manufacturabilityand predictability while incurring a modestperformancepenalty.
1
Introduction
As integrated circuit (IC) technologies continue forwardinto aanoscale, the traditional approaches application specific IC (ASIC)design are becomingineffective. Most prominently, the cost of ASIC design and manufacturing,including the increasing costs of masks, are too large to amortize over the total volumeof a low ~o mediumvolumeproduct. Standard programmableIC products, such as FPGAs[5], ofler a solution to rising design and maskcosts by amortizing cost over multiple products and applications. Furthermore,FPGAsare based on regular logic structures that are small enoughto tune and optimize for manufacturability and control of variations, then repeat manytimes to lbrm a complete array. But while FPGAs,as well as other programmableiCs, offer muchsimpler and more affordable implementation,it is at a high cost in terms of performanceand power. For example, an FPGA can be three times slower and can require ten times the area of a standard cell ASIC~
This nanoscale IC landscape requires newdesign styles and circuit fi~brics whichcan leverage the advantages of new’ technology nodes and provide ASICperformance at an affordable cost. One proposedfabric is a Via Patterned Gate Array (VPGA) [19] :81, whichis similar to an FPGA since is comprisedof an array of programmable logic blocks (PLB),but is distinctly different in two ways. First, the logic configurationis performedby an application.-specific via layer, rather than field programmedwith SRAM switches. Secondly, using metal maskconfiguration of"switches," the: routing is done on top of the PLBs,rather than beside the PLBsusing active tra~sistors. This significantly reduces the die-area overheadin contrast to dedicated routing channels in FPGAs.Program~ningthe vias requires a small set of application specitic via masks.
The choice of PLBarchitecture for the logic array is crucial. This problemhas been well studied for FPGAsin [12] and [2]. A heterogeneous PLBarchitecture for an FPGAis generally preferred over a homogeneousone and has been shownto provide superior performance. Due to high area and performance cost of SRAM switches, the granularity o1: PLBcomponentsneeds to be low. For Via conligurable fabrics, the area and perfor~nancecost of Via switches is low, whichpermits a more
granular PLBarchitecture.
Wepresent VPGA trade-off results for a heterogeneous PLBbased on
a CAD-fl,~wthat uses a combinationof comraercial and devek~pedtools. The results are compared in terms of gds-II clean implementations of RTLnetfists fi~r a sea-of-PLBswith a specific PLB architecture. Weexplore the cost of regularity with a sea-ot-PLBstyle physical impleme,ntation using a fabric specific tech-mapperand a packingalgorithm. Dueto the challenges posed by chemical-mechanicalpolishing (CMP)of nanoscale copper interconnects, there is evidencethat moreregular l:orms of routing maybe required [111 [ 10] to produce reliable, predictable back-endof line (BEOL)metal patterns. There are an increasing numberof commercial regular fabrics [7], including the eASICarchitecture [6] whichu.,~;es SRAM programmable PLBswith ASICstyle-routing or via-configurable :routing over the logic fabrics. Like FPGAs,these regular logic fabrics can be tuned and optimized for manufacturability and control of variations. Since the interconnect routing, area and performancecan have a dominantimpact on that of the IC, the regularity trade-offs for the back-end-of-line (BEOL)metal routing must be explored as well. Numerous research studies have explored the performancetrade-offs of FPGAsrouting architectures that employa regular routing structure programmablevia SRAM cells [12]. The major distinction from a regular logic fabric such as a VPGAis that FPGAsemploy SRAM programmableswitches andthe lotal die-area is limited by the configurableinterconr~ectarea. In contrast, circuit fabrics that employvia-configurable interconnect do no! suffer from this problem, since the overhead of extra switches (potential vias) is substantially less. For example,a VPGA can afford a muchhigher switch density per switch box, whichtranslates to a muchhigher flexibility in rc, uting. Furthermore,in an FPGA the numberof switches used while rc, uting a net is very critical due to the switch delay. For via configurableswitchesthis delay impactis substantially less.
As the next step in assessing regularity trade-offs, wedescribe a routi~ag frameworkthat accommodatesregular and structured routing architectures. This fcameworkallows us to explore the costperformancetrade-off’s associated with regularity in the BEOL metal architecture. In particular, we propose two regular routing modelsand compareIhemwith traditional ASICrouting. Wewill refer to the first regular routing style as structured routir~g, wherebyall of the metalmasksare apl?lication
specific, but the routing is highly restricted, arid fully populates the layers. For our secondmodel weproposeda structured routing that is based on via configurability, wherebyonly the via masksare application specific, and the metal masksare common for all applications. The metal routing is customized for a particular application by selecting via locations from amongthe numerous"potential via" options. Using the developedrouter, we explore a sea-of-PLBsstyle array routed using various routing styles. Therouting styles explored include the standard ASICstyle routing, a Structured ASICrouting and Via configurable routing. Results showa modestperformancepenalty is incurred for a substantial improvement in the ability to predict the silicon and manufacturingrealities. Wefurther explore the impactof redundantvias, as well as. the cost vs. advantageof other manufacturability enhancements.
Theremainderof this thesis is organizedin the ~bllowingway.Section 2 describes the design flow for a sea-of-PLBsstyle regular placementwith ASICstyle routing along with the results. In Section 3, we outline the aforementioned routing models followed by the proposed routing frameworkand results in Section 4. Section 5 presents someof our conclusions.
2
Regular Placement
Recently proposed fabrics such as the Via Patterned Gate Array (VPGA)[191 [81, lightspeed 171, eASIC [6], etc.,
represent
a compromise between FPGAsand standard cells
by employing regular
logic fabrics and interconnect structures, but customize the routing and logic function using via mask patterns
instead
of field-programmable
and [8] showed that
CMOSswitches and SRAMstorage bits.
a simple replacement
ef the FPGALUT (lookup table)
References 1191 would be the VPGA
LUTwhich is significantly faster than that for the FPGA,since there is one Iless level in the LUTtree, and there is a via connection to one of the supplie:~, rather than a connection through a muchmore resistive
SRAM cell.
Further,
[191 [81 developed a heterogeneous PLB ,consisting
of two 3-input
NANDs,(me via configurable 3-LU’f, a flip flop and a couple of buffers. Figure I shows ~:his LUT based PLB.
Figure 1: LUTbased PLB for VPGAFabric
2.1
CAD Flow
For a given PLBarchitecture, Figure 2 outlines the flow that weuse to mapan RTL-leveldescription of a designonto a regular array of PLBs.First weuse a restricted library of standardcells to obtain an ASIC-styledetailed placetnent of the design. This part of the flow uses commercialCAD tools with the exception of a logic-compactionstep. Next welegalize this placementby ’packing’ the standardcells into an array of PLBs.Our legalization algorithm workswith a cost function that minimizes perturbation of the AS[C-styeplacement, and thus minimizesany loss of performanceor increase in area at this stage. Finally weperformASIC-styleglobal and detailed routing on this regular array of PLBs.In the rest of this section, wedescribe ,each stage of this flowin greater detail. The restricted library of standard-cells used in~ this flow consists of the component cells of the given PLBarchitecture - for exampleNAND3, 3-LUT,bufli~rs and inverters. The library is further restricted in that each componen!cell has a fixed :dze whichis chosen to give a goodpower-delay tradeoff. This corresponds to the size of that componentcell in the PLB.The timing information for this library is generated by characterizing these cells using a commercialtool called CellRater from Silicon Metrics [14]. Weuse Design Compiler from Synopsys to do logic optimization and technology-mapping to this restricted library. Technology-mapping is followed by a compactionalgorithm that reduces the area of the netlist by better utilizing the given PLBarchitecture. Ouralgorithmfirst finds clusters of logic or "supernodes’ correspondingto functions with 3 or less than 3 inputs. This is done using a maxflow-mincut algorithm similar to Flowmap[9]. It then ~natches these computedsupernodes to the appropriate combination of PLBcomponents.This allows more logic to be collapsed into PLBs. For both the PLBarchitectures tha~: weconsidered, this compactionstep resulted in a significant reduction in total gate area of about 15%on the average. Wethen use a commercialtool called Dolphin from MontereyDesign Systems [15] to perform physical synthesis and placement.The result of this stage is a detailed ASIC-styleplacementthat has been optimized for performance, area and routability based on physical information. The resulting netlist includes logic changesand buffer insertion to meettiming constraints and area specifications.
Cell Characterization (SM- Cellrater)
PLB Components (LUT, DFF, NAND3...)
Timinglibrary ,~ Synthesis, Mapping ~ (Design Compiler)
RTL
Regularity driven Logic compaction
Physical synthesis (Dolphin) ASIC Placement
Packing into array of PLBs
PLB Array ASIC Routing (Dolphin)
Figure 2: CAD-flowfor an ASiC-routedregular PLB.-array
After obtaining this ASIC-styledetailed placements,our next step is to ’legalize’ this by packing the componentcells itnto a regular array of PLB~;.Our packing algorithm does this by recursive quadrisection. At each quadrisection level, the component cells are relocated to other regions of the chip dependingon the availability of the corresponding resource. For example, if there are more 3-LUTsin a region of the chip comparedto ~:he resources available in the PLBsin that region, some 3-LUTswill be movedto the nearest region of the chip that has unused3-.LUTsavailable. The cost function used in this algorithmtakes into consideration the criticality of lhe cells being movedand also tries to minimizeperturbation of the ASIC-styleplacement. In order to further minimizethe loss in performancedue to the moticmof the componentcells, weuse the packing algorithm in an iterative loop with the physical synthesis tool Dolphin. In each iteration,, the packingalgorithmrestricts the locatiions of someof the components to regions of the chip thai: have unusedresources available. Physical synthesis is repeated with these restrictions to producenewlocations for the remainingcomponents,and tc. also redo the buffer insertion and logic restructuring wherenecessary. This iteration loop is re, peateduntil all the .componentshavebeenallotted legal locations in the PLBarray, andensuresthat the pe, rformancede:!;radation due to ~egalizing the ASIC-style placementis minimal. After legalization, we use Dolphinto perform ASIC-stylecustomglobal and detailed routing on the regular array of PLBs.Wemeasurethe final performanceof the design by running static timing analysis in Dolphinwith data from post-layout ext:taction. 2.2 Results Weusedi the flow describedin Section 2. i to implementdiffet’ent designsonto an gate-array of regular PLBs.In this section we comparethe area and performance, of gate-arrays using the heterogeneous LUT-basedarchitecture shownin Figure 1. Wepresent cornparisons for four different designs. The designs ALU, FPU,and Networkswitch are dominated by datapath, while the design Firewire is a small controller that is dominatedby control logic. Table1 showsthe’, comparisonof the final die--area and timing respectively, lbr each of the following design flows:
Design
Area (sq microns) I Av.(Top 10) 1 Slack(ns) Arithmetic Logic Unit (651 Gates) ASIC 5600 -0.45 ASIC (PLB Components Lib) 7800 -0.302 PLB Array 18225 -0.308 Firewire Controller(4247 Gates) ASIC 27027 - 1.31 I ASIC (PLB Components Lib) 40944 -1.44 PLB Array 56250 - 1.45 ] Floating Point Unit (24k Gates) ASIC 409600 -7.680 I ASIC (PLB Components Lib) 423444 -8.794 PLB Array 784995 -9.08 I Network switch (80k Gates) ASIC 1752048 -2.521 ASIC (PLB Components Lib) 2088025 -2.671 PLB Array 3294:225 -3.09 PLB Array (2 Lut’s per PLB) 2863:500 -3.181
Av. Slack (ns) (All) -0.431 -0.266 l -0.2794 -1.27285 -1.35654 -1.67535 -3.90513 -3.92062 -4.78379 -1.9373 -2.2288 -2.4916 -2.6434
Table 1: Standard Cell vs. PLB-Array ASIC is the Standard Cell ASICflow using a commercial. 13u Library from ST Microelectronics. ASIC (PLB ComponentsLib.) is obtained it we skip ~he Packing step lbr the design flow described in Section 2.1. Essentially, it is the standarcl cell ASICflow using a library which comprises of cells that make up each PLB. PLB Array is the design flow described in Section 2. l.This produces a regular PLB ancay with ASlC-style custom routing. Weshow the final Die area, and the worst slack averaged over the top-lO and all the paths in the desi.gns. The gate count for each design is given in units of equivalent 2-input Nand gates. The ASIC (PLB Components Lib) and PLB Array implementations
have an area overhead compared to
ASIC because the PLB-library elements (LUTs) are larger than the corresponding standard cells. Furthermore, for the PLBArray, all the elements ,of a PLBare not fully ~atilized. ALUdoes not use any Flip-Flops.
For example, the
The AS1C(PLB Components Lib) and ASICimplementations
have
very comparable performance, demonstrating that the PLB-library is competitive in terms of delay. 10
The PLB Array has degraded performance dne to cell movementduring packing as well as increased chip area. Results indicate that the componentsto be chosen for the PLI3 are dependent on the applicalion being mapped. For example, we mapped the Net~vork Swilch to a PLB-array where each PLB .contains 2 LUTs. This design had substantially
less die-area.,
combinational elementts per PLB.
11
indicating
the need for a higher number of
3
Regular Routing Models
ASICsemploya fully customizedrouting modelthat requires complexdesign rules and applicationspecific masksfor each routing layer~ This routing fle~fibility is potentially at the cost of systematic yield failures. Moreover,while total wirelength is minimizedas comparedto a morerestricted routing style,., this can often result in a high numberof interlayer vias due to the meanderingnature of the shortest wiring paths. This problemis becomingworseat nanoscale, wheretotal wiring can often dominatetotal area, andpredicting congestio:aa priori i~s extremelydifficult. Moreover,as via failure rates and parasitic resiistances becomingincreasingly dominantfor shrinking feature sizes, this lack of wiring predictability becomesincreasingli~ proble~atic fl)r timing closure and design lot manufacturabilit~¢. Structured ICs, such as FPGAswhichare an extreme exampleof this category, employ a fixed set of metal and via masksthat are field proi:.,,rammable using SRAM programmable switches. Thesefi:~:ec~ structures provide regular repeating patterns in each of the metal masklayers whichallow for extensive optimization and tuning for manufacturability and control of systematic failures. Therefore, wewill first consider the performztnceand area penalty associated with a similarly "structured" routing architecture that also providesregular metal patterns anddensities yet offers ASIC-like configurability. 3.1
Structured
Routing
Figure 3 showsan exampleof a fully structured r.outi~ag modelas it appears on a gridded template. Thevou~Ier uses resources from the underlying templz~teand is free tc, "cut" metal lines to desired lengths as required by the application. Altho~:tghthiis producesfully customizednon-repeatingshapes in the rc, uting layer, the router is based on rules whichdo not allow it to produceundesirablestructures. Theepieces of metal whichare not used for routing remainin the mel~almask,thus producinga uniform metal density. Referring to the grid template in Figure 3, each grid-point is a "potential via" site. Therouter is restricted to workon a strict grid, wherebywrong-way wiring and arbitrary spacing and sizing are prohibited. Parameterssuch as metal overhanga! via sites and relative p,,~sitioning of vias are pre-
1:2
Required length, on-grid
Metal overhang at viasite
Unutilized~ resou rc e s--,...._~Z../Figure 3: Structured routing ,an a grid template. conligured. "Potential vias" at consecutive rows or columns are prohibited in our model to allow for more compactgrid-spacing. Clearly, the packing density of the routes, as well as the total interconnect resistanoe and interconnect capacitance will be worse {han those for the ASICrouting model. As a result there is some amount of performance penalty incurred when using this model. This loss in performance, however, is traded-off against the in~prow:ment in manufactu~:ability. 3.2
VIA Configurable
By further restricting
Routing
the structured routing modeland not allowing the router to "cut" metal lines at
arbitrary locations, we obtain the via configurable ~’outing model. The only freedom that the router has is selection of vias to complete .:~ route. This additional routing structure clearly inctars additional performance and area penalties.,
but provides two important advantages: l ) application-specific
customization for low volume products requires only a set of customized via masks, or potentially provides for direct write configurability [131; 2) The numberof via unique via patterns that :must be printed can be better controlled and optimized for printability.
Wewill further showin the following
section that adding redundant vias can be handled more effectively however, there is a decrease in routing flexibi]ily,
as well. With fixed metal masks,
along with a penalty of dangling metal capacitance.
This extra capacitance results in an additional performance degradation l:hat we will assess in the results section. Figure 4 shows an example of via co~afigurahle routing architecture.
The basic re-
peating unit is an alternating set of short and long metal tracks with a se~: of jumpers at the end of
each track. Each overlap of M3and M4tracks is a "pomntial via" site. M3tracks also serve as access tracks for pins on each gate which are in M2. Each repeating unit can be viewed as a switch box with tracks entering and leaving. The jumpers provide for paths to enter the switchbox and continue in the same direction. Switching a track orthogonally within lhe switch box is costly in terms of the dangling capacitance incurred at the source track being switched.
Jumpers Figure 4: Repeating unit for a via conligurable fabric. The granularity
of this structure
is determined by the parameters Wand H. Choosing a large
value for these parameters (lower granularity)
cau:ses a higher dangling capacitance when switching
tracks orthogonally but provides less resistive of Wand H (high granularity)
paths in the same direction.
Choosing low values
reduces the dangling capaci|ance but increases the resistance of the
paths since more vias are used while routing. Thus the granularity particular application domain and process. Architecturally,
can be optimally chosen lk)r
it is better to have this structure with
higher granularity in the lower layers (M3& M4)and nave a similar structure with lower granularity in higher metal layers; (M5 & M6). Routing that is relatively
local can be done in M3and M4and
longer routes can be done in M5and M6. The "potential
vias" at the jumper sites are more, often used during routing than at the other
sites. They also form potential via failure locations due to long metal lines terminating in vias. The
"
1
structure
shown in Figure 4 can be easily extended to accommodate redundant vias at the jumper
sites, as shownin Figure 15. The packing density of the tracks in the structure with redundant vias is only slightly lower than the one without redundant vias,
Redundant Via Sites Figure 5: Structure in Eigu~ce 4 with redundant vias.
4
A Regular Routing Framework
The proposed structured
routing and via configurable routing architecture
ASICand FPGArouting models on the axes of flexibility,
manufacturability
models lie in-between and performance. We
represent both routing models in terms of a resource graph which is similar to the routing resource graphs u:~ed for FPGAs.The difference
being that the resource graph lies on top of logic blocks
instead of within dedicated routing channels. Moreow~r,the switch density as well as the packing density o:f the routes is muchhigher for via patternable regular fabrics than for FPGAs,and more comparable to that for to ASICs. Therefore, directly
applied.
routing methods used in ASICs or FPGAscannot be
For this reason we have developed a :routing
framework which accommodates the
proposed routing models and allows us to compare their performance with that for ASICs. Yhe generalized routing architecture (resource graph) is described using an ASCIIdescription language. 4.1
Routing Architecture
Description
Along with the placement of cells in the design, the router takes in an architecture description file. This file describes the metal structures that are available for routing. The architecture description language has primitives such as Rectangle, Polygon to define rectangular wires as well as polygon suitable [br Manhattan style routing. The Via and PVia primitives
are used to define a via and a
"potential via" respectively. Sections are used to group various primitives and instances of other Sections. Sections can be parameterized with architectural
parameters such as number of wires spanning
various lengths, number of redundant vias etc. This provides a convenient way of changing various parameters and observing the effect on the overall performance. The Ins~antiate statement can be used to create instances of Sections and Primitives in a certain layer a~t a desired coordinate:. Loop statements are used to instantiate
defined Sections and Primitives repeatedly. Such an architecture
description can be used to define a complex metal s~tructure suitable for the ’via programmablemodel, or a simple grid/or the structured routing model.
4.2
Routing Flow/Algorithm
In order t,o comparethe perl-k)rmancecharacteristics of the proposedroutir~g modelswith standard ASICstyle routing, we developed a routing tool whichaddresses the requirements of the proposed routing models. Figure 6 showsthe flow within the: proposedrouting tool. Weuse a rip-up reroute scheme1.4] to resolve the order in which the nets should be routed. The routing architecture is described using the architecture description language. This architecture description is abstracted out into a resource routing graph whichuniquely represents the architecture. Metal wires are abstracted as nodes, "potential vias" are abstracted as edves,., in this graph, Finding a path betweenany two nodes in tl~is graph corresponds to completinga connection betweenthe: correspondingmetal wires by selecting a set of "potential vias". Resourcegraph
I
I
Globalrouting (Dolphin)
Initialize, bounding boxesandnetviolationcosts
Foreach net: Min-costsearch within the bounding box
Rip-up reroute
Refinebounding boxconstraintsand net-violationcosts Figure 6: Routingflow.
17
Underl,ng Metal Architecture
The routing is broken into a global routing and a detailec{ routing phase. Weuse the commercial physical synthesis tool "EIolphin" from Mon~IereyDesign Syslems citedolphin to perform the array placement and generate the global routing information. The global routes are used to create bounding boxes for each net. Figure refdetailrouting
show:s; the bounding box for the global route which is
overlaid on the resource graph. Each bounding box represents an area within which the corresponding net where it would preferably appear after detailed
routing. Along with the bounding box, a net-
violation cost is defined for each net. This cost represents the penalty incurred by another net if routed using any of the resources used by this net. The initial
values ,of the net-violation costs are
same of each net. The router proceeds by finding a min-cost embedding for each net wi~Ihin the bounding box defined for it. For a two point net, this embedding reduces to finding a minim~amcost path between the source and the destination
and the path must lie within the bounding box. For a
multipoint net, a minimumcost path between a node and any of the nodes :in a set of nodes has to be found. The/actors contributing to this cost function for embeddingeach net are the total re:~istance and the capacitance of the embeddednet as well as the ne~-violation penalties of nets it violates. Following lhe embeddingof each nel, the number of violations for each net is calculated. Based on this numberand the criticality iteration
of the net, the boundiing box and violation cost is recomputed Another
of embedding the nets is carried out using the newly computed bounding boxes and net-
violation costs. The router stops if all the embeddednets are free of any violations or a maximun~ number of iterations
is reached. The above scheme, when used with the structured
routing model,
allows the router to create new nodes in the resource routing graph in order to model "slotting"
of
wires at desired places;. This feature of the router i>; switched off for the via configurable model. 4.3
Experiments
& Results
Using the routing flow in Figure 6, we co~npared the performance of the standard ASIC routing model with structured and via configurable routing. We: further explored the advantages of redundant vias at the jumper sites of the via configurable routing fabrics, as shownin Figure 5. Webegin with an RTLcircuit description and produce the placement in the form of a regular array of logic blocks, as descritbed in Section 2 above. Each logic block: consists of a Look-up I:able, two NAN[)gates,
Flip-flop and a pair oi’ buffers. Each placed design is routed using the structured model ai~d the via configurable model using our routing framework. The configured metal architectures back into "Dolphin" to facilitate
netlist
are incorporated
checking and post-routing timing reports. The ASICrouting
example is generated using "Dolphin’s" standard flow. To produce a regular placement of the RTLnetlists, tion 2. Weused a six-Metal,
we used the design-flow discussed in Sec-
130 nanometer comme~:cial process from ST Microelectronics.
The
structured routing was set up for a uniform grid ow::r the entire routing space in each of M3.. lVI4, M5 and M6. For the via configurable model we used the architecture
described in Section 3.2. The W
and H values for the architecture were carefully’ chosen to provide a good eugineering trade-off point. Figure 7 shows the performance plot of a firewire controller
which was routed using different values
of W(H was chosen to be’, equal to W). Large values o1~ W(low granularity) cause a higher dangling capacitance when switching tracks orthogonally,
lhus degrading the performance. Lower values of
W(higher granularity) reduce the dangling capacilance but increase the resistance of the paths due to large number of vias again degrading the performance. Thus an optimal value of We~ists for minimal delay. As expected, via count monotonically decreases as the value of W increases.
For
this reason we selected a value of Wwhich provided a good via count without incurring a substantial delay penalty. Table 2 compares the proposed routing models wit~ the ASICrouting model. Each of the benchmarks was routed with the structured
model, the via configurable model {Via cfg -I, Figure 4) and
the Via configurable model with redundant vias (Via cfg-II,
Figure 5). Interconnect
Capacitance
and Inte:rconnect Resistance denote the total intercc, nnect capacitance and the resistance of the routes respectively. Weinclude the average slack of the top tea critical paths in the routed design. T]he slack is measured with respect to a cycle time of .5ns. The "’Non-Redundant Via Count" is the total nonredundant vias in the design. The last column in Ihe table shows the RandomYield loss due to Via failures.
Wecalculate the yield loss due to via failures using an exponential failure model for which
yield loss is modeled by Exp(-a N); where a is the failure rate in parts per billion and N is number of non-redundant vias in design. The performance of tlhe Structured and Via configurable romed designs is comparable to the’, ASICrouted designs. The via cor~figurable fabric has higher interconnect
Firewire Controller 2030
.
50000
2020 40000 2010
3500045000
2000
30000
2500o
~
1990
20000 15000
1980
10000 1970
5o00
1960
0 4
5
6
7
8
W (microns)
9
10 11 = Average Delay --~,-- Via Count
Figure 7: Performance and Via-count vs. Granularity capacitance and resistance The higher capacitance is due to the dangling ends of the metal wires used for routing The higher resistance is mainly due to the a high via count. On an average, l:here is a 13.2% degradation in pedbrmance for the via configurable routed designs. The FPUsuffers from a large performance degradation (5.1% for the Slruclured and 23.6% for the Via Configurable) due its long ][ogic depth of the critical path. The "Via cfg - II" architecture is particularly interesting since it has a significantly
lower non-redundant via count, tln:us, improving the yield. The "potential vias"
at the jumper sites are more often used than at othe.r sites. On an average, I:hey constitute about 50% of the total vias in the design. A large numberof r,edundant vias in the "Vi:a cfg - I|" designs lead to the lowering of interconnect resistance and the improvement in pertbrmance over the "Via ctg - l" architecture. Figure 8 and 9 showpost-detailed-routing
plol:s of the firewitre controller routed using
the "Via cfg - I" (Figure 4) and Standard ASICmodel respectively.
The via configurable architecture
provides an extremely regular set of masks which pro,~es to be much better from metallization
poim
of view. Part of our future work will be to predict the improvementin long-range systematic failures due to the regular metal patterns and controlled metal wiring densities.
2O
Design
Interconnect Capacitance (pF)
I
ASIC Structu:red Via cfg - [ Via cfg - II
11.14 12.95 19.58 19.6
ASIC Structured Via cfg- 1 Via cfg- II
27.89 29.43 38.19 38.61
ASIC Structured Via cfg - I Via cfg - II
311.07 362.67 598.36 607.24
ASIC Structured Via cfg - [ Via cfg - I I
1828.9 2273.4 3924.1 4001.3
Interconnect Average Non-Redundant Resistance Slack (1-10) Via Count (Kf~) (ns) Arithmetic Logic Un# (614 nets’) 37.81 -0.301 3163 .-0.3()1 49ll 41.81 90.04 -0.~,1 12017 -0.~1 4110 76.33 Firewire Controller (1783 nets’) 97.13 - 1.45 15560 102.01 - 1.4.56 17631 164.31 - 1.55 26775 144.33 - 1.54 16410 Floating Point Unit (25322 nets’) 1003.5 -9.08 132K 1112.9 -9.514 172K 339K 2185.4 -11.22 -11.16 158K 1835.6 Network Switch (64149 nets) 4854.3 -3.09 564K 767K 6389.1 -3.28 144K -3.76 1384Ki -3.67 719K 131K
Random Yield Via Limited
0.999955 0.999931 0.999832 0.999942 0.999782 0.999753 0.999625 0.999"77 0.998154 0.997595 0.995265 0.99779 0.992135 0.989319 0.980811 0.989984
Table 2: ASIC, Structured and Via configurable routing
As the results demonstrate, the random yield loss penalty due to increased number of vias is negligible, especially if the redundant via re-configuration schemeis appliecll (Via cfg - II). The overall yield should be significantly
higher though, since the systematic and parametric yield componentsare
maximizedby using the regular metallization structures.
Moreover, this wil~l also provide muchmore
robust designs since there will be less variabil, ity in the printed feature dimensions and topography as a function of process windows.
21
Figure 8: Fivewire controller- Regula:~ Pl~cemem.ViaContigurable routed
22
Figure 9: Firewive controller-
Regular Pl~ce~aent,ASiC routed
5
Conclusions
In this work we developed a CAD-flowfor a regular sea-of-PLB style placement and studied the performance and area tradeoffs
vs. regularity.
Wel:,resented
two routing models which address
the growing need of regularity
in design motivated by the challenges involved with nanoscale IC
manut;acturing. Using the router that we developed, we explored various regular metal archi~tectures and compared them with ASICrouting. Figure 10 shows the performance and the area characteristics of various application specific IC design methodologies explored in this. work.
Network Switch (80K Gates) Area and Performance PLB-Array Via Configurable Routing
200 180 160 140 120 100 80 60 40 20 0
PLB-Array ¢~’AS IC Routed
PLB-Array Structured Routin~
Standard Cell ~¢~ PLB Library ~’Standard Cell SI- Library
-4
-3
-2
-1
0
Average Slack (Top 10 paths)
Figure 113: Area and Performance ior various design methodologies
The performance penalty for the PLB array designs is modest compared to the Standard cell designs although there’, is a substantial area penalty due to un-utilized resources in the PLBs. A large "die-area" affects the feature limited yield due to random faults.
Wehope to offset this by a good
systematic yield due to high structural regularity in the design, thus the numberof good dies per wafer is still high. ¯
24
6
Acknowledgements
I wouldlike to express mymost sincere gratitude to myadvisor, LawrenceT. Pileggi. Larry, thank you for your unfailing guidanceand support, in technical workand in other aspects of life. l’~pecial thanks also go to Andrzej Strojwas, HermanSchmit and Ruehir Purl from IBMT.J. ’Watson Research Center. Thankyou for co-advising me. VPGA ~s a large project, and I am fortunate 1o have the opportunity to work with a dynamic team of fiAlow graduate students, Padmini Gopalakrishnan. Kim YawTong, Aneesh Koorapaty, Chetan Patel, Vyaeheslav(Slava) Rovner, and R.. Reed Taylor. Thankyou for your contributions to various aspects of the VPGA project, and most impo~:tantly, thank you for your friendship. Last but not least, I wouldlike to thank myParents for everything.
¯
25
References [1] B. Tyrrell, M. Fritze, D. Astolfi, R. Mallen, B. Whee].er, "Investigation of the physical and practical limits of dense-only phase shift lithography for circuit feature definition," Phoa~o-Optical Instrumentation
Society
Engineers, Octobe>2002.
[2] E. Ahmedand J. Rose, "The Effect of LUTand Claster Size on Deep-Submicron FPGAPerfof mance and Densi’¢y," ACMInternational 13] M. Palusinski,
Symposium on F.PGAs, 2(2100.
A. J. Strojwas and W. Maly, "l?,egularity
in Physical l)esign,"
GSRC~brkshop,
Las Vages, NV, June 17-18, 2001. [4] PathFinder:
A Negotiation-Based
Performance-Driven
Router for FPGAs(11995) Larry Mc-
Murchie, Carl Ebeling [5l www.xilinx.com [61 www.easic.com [71 www.lightspeed.com [8] K. Y. Tong and xL Kheterpal and V. Rovner and L. Pileggi and HI. Schmit and R. Puri, "’Regular Logic Fabrics for Via Patterned Gate Array (VPGA)," 1EEECustom Integrated
Circuils
Co~ference, January, 2003. [91]J. Cong and Y. Ding,. "Flowmap: An Optimal Technology Mapping Algorithm for Deiay Optimization in Lookup-Table Based FPGADesigns," IEEE Transactiona on Computer Aided Desig~! ~¢’lntegrated Circuits and Systems, January, 1994. [101 B. Ty~ell, M. Fritze,
D. Astolfi,
R. Mallen, B. Wheeler, "investigation
of the phy~dcal and
practical limits of dense-only phase shift lithography figr circuit feature definition," Socie~.’ Photo-Optical Instrumentation Engineers, Ocrcober-2002. 11 ] A..1. Strojwas, "Process-Design Interaction Modeling Based Design For Manulacturat>illity, torial," ¯
Design Automation Conference, June ’2003. 26
Tu-
[12] V. Betz, J. Rose, and A. Marquardt Archi.tecture Academic Publishers,
ar~d CADfor Deep-Submicron FPGAsKluwer
1999
[13] Gonzales, AnthonyJ.; Freyer, Jorge I_.; Fok, Samuel S., "Recent results in the application of e-beam direct-write lithography," Proc. SPIE l,bI. 1089, p. 374, 19!;)6. [14] www.siliconmetrics.com/Products/CR.asp [15] www.mondes.com/products/dolphin.htm [ [ 6] J.S. Kilby, "Miniaturized electronic circuits," US Patent 3.138, 743., 23 June 1964. [ 17] G.E. Moore, "Crammingmore compone~tsonto integrated circuits,."
Electronics, Vol. 38, No. 8,
April 1965. [ 18] L.W. Liebmann, "’Layout impact of resolution enhancement technique, s: impediment c,r opportumty’~, ’’ Proc. oJInternational Symposiumon Physical Design, Apr 2003, pp. 110-11’7. [191 L. ]:’ileggi,
H. Schmit, A.J. Strojwas, P. Gopalakrishnan,V. Kheterpal, A. Koorapaty,C. Patel,
V. l~’,ovner, K.Y. Tong, "Exploring regular fabrics to op~:imize the performance-cost trade-off," Prec. of Design Automation Conference, June 20133. [20] W. Maly, "IC design in high-cost nanometer-technology era," Proc. o¢Design Automation Conference, June 2001, pp. 9-14. [211 RS. Zuchowski, C.B. Reynolds, R.J. Grupp, S.G, Davi~;, B. Cremen, and B. Troxel, ".zk hybrid ASIC and FPGAarchitecture,"
Proc. of International
C(mference on Computer Aided Design,
2002, pp. 187-194. [22] M.J.S. Smith, "Application-Specific
Integrated Circuits;,"
Addison ~esley Longman, Inc., Feb
1999. [23] S. Brown, and J. Ro:~e, "FPGAand CPLDArchitectures : a tutorial," pulers, Vol. 13, Issue 2, 1996, pp. 42-57.
¯
27
Design and Tes; of Com-
[24] Z. C)r-Bach, Z. Wurman,R. Zeman, and [~. Cooke, "Customizable and programmable cell array," US Patent 6,331,790, 18 Dec 2001. [25] L. Pileggi, H. Schmit, J. Shah, Y. Tong, C. Patel, and V. Chandra, "A Via Patterned (_3.ate Array (VPGA),"Technical Reports Series oJ the CMUCer, ter for Silicoti~
System Implementation,
No. CSSI 02-15, Mar 2002. [26] C. Patel, A. Cozzie, H. Schmit, L. Pileggi, "An architectural Arrays," Proc. of International [27] R (,’how, S.O. Seo, .I.
exploration of Via Patterned Gate
Symposium on Physical Design, Apt 2003, pp. 184-189.
Rose, K. Chung, G. Paez-Monzon, and I. Rahardja, "The design of
SRAM-based field-programmable gate array -- part II: circuit design and layout," IEEE 7"rans. on VLSI ~vstems, Vol. 7, No. 3, Sept 1999, pp. 321-330. [28] K. Yano, T. Yamanaka,T. Nishida, M. Saitho, K. Shi:mohigashi., acid A. Shimizu, ’A 3.8’~.s CMOS16 x 16 multiplier
using complementary pass transistor
logic,"
Proc. of Custom lnte-
gratea Circuits Conference, May1989, pp. 15-18. [29] ES. Lai, and W. Hwang, "Design and implementation of differential with pass-gate (DCVSPG) logic for high-perI;armance digital
cascode voltage switch
systems.," 1EEEJournal of Solid-
State Circuits, Vol. 32, No. 4, Apt 1997, pp. 563-573. [30[ D. Sylvester,
and H. Kaul, "Future Performance Challenges in Nanometer Design," Proc. oJ
Design Automation Conference, June 2(;10 I, pp. 3--8. [311 R. Gonzalez, B.M. Gordon, and M.A. Horowitz, "Supply and threshold voltage scaling for low power CMOS,"1EEEJournal of Solid-State
Circuits,
Vcl. 32, No. 8, Aug 1997, pp. 1210-1216.
132] K. Usami, and M. Horowitz, "Clustered voltage scaling technique for low-power design," Proc. of International ~ymposiumon LowPower D,’-_’sig~, Apt 1995, pp. 3-8. codec core exploiting voltage [331 K. Usami, et al, "Design methodologyof ultra low-powe, r MPEG4 scalting techniques," Proc. of Design Automation Confe,-ence, June 1998, pp. 483~t88
[34] A. Koorapaty, V. Chandra, K.Y. Tong, ,C. Patel, L. Pileggi, programmablelogic block architectures,"
and H. Schmit, "Heterogeneous
Proc. of Desig~t, Automation and Test in Eur~pe, Mar
2003, pp. 1118-1119. [35] R. Tayior, K.Y. Tong,H. Sch~nit, and L. Piieggi, "(:;ate ArrayVoltageScaling (GAVS):,,~nabling energy-efficiency in via-patterned gate array devices," Submitted to, ICCAD2003.
29