Design Flow for Regular Fabrics

9 downloads 153037 Views 1MB Size Report
application specific IC (ASIC) design are becoming ineffective. ...... [22] M.J.S. Smith, "Application-Specific Integrated Circuits;," Addison ~esley Longman, Inc., ...
Carnegie Mellon

Design Flow for Regular Fabrics

Veerbhan

Kheterpal 2004

Advisor: Prof. Pileggi

Electrical ~ Computer

ENGINEERING

Carnegie Mellon University

[)esign flow for RegularFabrics

Masters Project Report

By VeerbhanKheterpal (veerbhan @cmu.edu) Departmentof Electrical & ComputerEngineering

December 2003

Contents 1

Introduction

3

2

Regular Placement 2.1 CAD Flow ......................................... 2.2 Results ..........................................

6 7 9

3

Regular Routing Models 3.1 Structured Routing .................................... 3.2 VIA Configurable Routing ................................

12 12 13

4

A Regular Routing Framework 4.1 Routing Architecture Description .......................... 4.2 Routing Flow/Algorithm ............................. 4.,3 Experiments & Results ................................

16 16 17 18

5 Conclusions

24

6 Acknowledgements

25

List of Figures 1 2 3 4 5 6 7 8 9 10

LUT based PLB for VPGA Fabric .......................... CAD-flow for an ASIC-.routed regular PLB-array ................. Structured routing on a grid template ....................... Repeating unit for a via con figurable fabric ...................... Structure in Figure 4 with redundant vias ........................ Routing flow ....................................... Performance and Via-count vs. Granularity .................... Firewire controllerRegular Placement,Via Configurable routed ............ Firewire controllerRegular Placement,ASIC routed ................... Area and Performance for various design methodologies ..............

6 8 t3 14 15 17 20 22 23 24

Abstract In an efl~ort to control the parametervariations, rising maskcosts and siicstematic yield problems that are lhreatening the affordability of application-specific ICs, newform’.~of designregularity and structure have been proposed.For example,there has been speculation [101 that regular logic fabrics [191 based on regular geometrypatterns [3] can offer an economicsolution [8] to provide, tighter control of variations and greater control of systematic manufacturingfailures. Anemphasis:is being placed on developingnew, regular logic fabrics that leverage the regularib, and programmabilityof FPGAs,yel deliver a level of performanceand density close to ASICs.Wepresent a complete CADflow thai: uses a combinationof commercialand developedtools to provide regular implementations of designs (RTLs).These: implementationsexhibit high physical regularity in the silicon as well the interconnect layers. Thedevelopedtools include a tech-mapperwhicl-~ exploits regularily in the design, a packingtool and a regular-router. Wefurther proposenewregular routing architectures and explore the various performancevs. manufacturability trade-offs. Result:~ demonstratethat a more regular, restricted architecture can provide a substantial advantagein terms of manufacturabilityand predictability while incurring a modestperformancepenalty.

1

Introduction

As integrated circuit (IC) technologies continue forwardinto aanoscale, the traditional approaches application specific IC (ASIC)design are becomingineffective. Most prominently, the cost of ASIC design and manufacturing,including the increasing costs of masks, are too large to amortize over the total volumeof a low ~o mediumvolumeproduct. Standard programmableIC products, such as FPGAs[5], ofler a solution to rising design and maskcosts by amortizing cost over multiple products and applications. Furthermore,FPGAsare based on regular logic structures that are small enoughto tune and optimize for manufacturability and control of variations, then repeat manytimes to lbrm a complete array. But while FPGAs,as well as other programmableiCs, offer muchsimpler and more affordable implementation,it is at a high cost in terms of performanceand power. For example, an FPGA can be three times slower and can require ten times the area of a standard cell ASIC~

This nanoscale IC landscape requires newdesign styles and circuit fi~brics whichcan leverage the advantages of new’ technology nodes and provide ASICperformance at an affordable cost. One proposedfabric is a Via Patterned Gate Array (VPGA) [19] :81, whichis similar to an FPGA since is comprisedof an array of programmable logic blocks (PLB),but is distinctly different in two ways. First, the logic configurationis performedby an application.-specific via layer, rather than field programmedwith SRAM switches. Secondly, using metal maskconfiguration of"switches," the: routing is done on top of the PLBs,rather than beside the PLBsusing active tra~sistors. This significantly reduces the die-area overheadin contrast to dedicated routing channels in FPGAs.Program~ningthe vias requires a small set of application specitic via masks.

The choice of PLBarchitecture for the logic array is crucial. This problemhas been well studied for FPGAsin [12] and [2]. A heterogeneous PLBarchitecture for an FPGAis generally preferred over a homogeneousone and has been shownto provide superior performance. Due to high area and performance cost of SRAM switches, the granularity o1: PLBcomponentsneeds to be low. For Via conligurable fabrics, the area and perfor~nancecost of Via switches is low, whichpermits a more

granular PLBarchitecture.

Wepresent VPGA trade-off results for a heterogeneous PLBbased on

a CAD-fl,~wthat uses a combinationof comraercial and devek~pedtools. The results are compared in terms of gds-II clean implementations of RTLnetfists fi~r a sea-of-PLBswith a specific PLB architecture. Weexplore the cost of regularity with a sea-ot-PLBstyle physical impleme,ntation using a fabric specific tech-mapperand a packingalgorithm. Dueto the challenges posed by chemical-mechanicalpolishing (CMP)of nanoscale copper interconnects, there is evidencethat moreregular l:orms of routing maybe required [111 [ 10] to produce reliable, predictable back-endof line (BEOL)metal patterns. There are an increasing numberof commercial regular fabrics [7], including the eASICarchitecture [6] whichu.,~;es SRAM programmable PLBswith ASICstyle-routing or via-configurable :routing over the logic fabrics. Like FPGAs,these regular logic fabrics can be tuned and optimized for manufacturability and control of variations. Since the interconnect routing, area and performancecan have a dominantimpact on that of the IC, the regularity trade-offs for the back-end-of-line (BEOL)metal routing must be explored as well. Numerous research studies have explored the performancetrade-offs of FPGAsrouting architectures that employa regular routing structure programmablevia SRAM cells [12]. The major distinction from a regular logic fabric such as a VPGAis that FPGAsemploy SRAM programmableswitches andthe lotal die-area is limited by the configurableinterconr~ectarea. In contrast, circuit fabrics that employvia-configurable interconnect do no! suffer from this problem, since the overhead of extra switches (potential vias) is substantially less. For example,a VPGA can afford a muchhigher switch density per switch box, whichtranslates to a muchhigher flexibility in rc, uting. Furthermore,in an FPGA the numberof switches used while rc, uting a net is very critical due to the switch delay. For via configurableswitchesthis delay impactis substantially less.

As the next step in assessing regularity trade-offs, wedescribe a routi~ag frameworkthat accommodatesregular and structured routing architectures. This fcameworkallows us to explore the costperformancetrade-off’s associated with regularity in the BEOL metal architecture. In particular, we propose two regular routing modelsand compareIhemwith traditional ASICrouting. Wewill refer to the first regular routing style as structured routir~g, wherebyall of the metalmasksare apl?lication

specific, but the routing is highly restricted, arid fully populates the layers. For our secondmodel weproposeda structured routing that is based on via configurability, wherebyonly the via masksare application specific, and the metal masksare common for all applications. The metal routing is customized for a particular application by selecting via locations from amongthe numerous"potential via" options. Using the developedrouter, we explore a sea-of-PLBsstyle array routed using various routing styles. Therouting styles explored include the standard ASICstyle routing, a Structured ASICrouting and Via configurable routing. Results showa modestperformancepenalty is incurred for a substantial improvement in the ability to predict the silicon and manufacturingrealities. Wefurther explore the impactof redundantvias, as well as. the cost vs. advantageof other manufacturability enhancements.

Theremainderof this thesis is organizedin the ~bllowingway.Section 2 describes the design flow for a sea-of-PLBsstyle regular placementwith ASICstyle routing along with the results. In Section 3, we outline the aforementioned routing models followed by the proposed routing frameworkand results in Section 4. Section 5 presents someof our conclusions.

2

Regular Placement

Recently proposed fabrics such as the Via Patterned Gate Array (VPGA)[191 [81, lightspeed 171, eASIC [6], etc.,

represent

a compromise between FPGAsand standard cells

by employing regular

logic fabrics and interconnect structures, but customize the routing and logic function using via mask patterns

instead

of field-programmable

and [8] showed that

CMOSswitches and SRAMstorage bits.

a simple replacement

ef the FPGALUT (lookup table)

References 1191 would be the VPGA

LUTwhich is significantly faster than that for the FPGA,since there is one Iless level in the LUTtree, and there is a via connection to one of the supplie:~, rather than a connection through a muchmore resistive

SRAM cell.

Further,

[191 [81 developed a heterogeneous PLB ,consisting

of two 3-input

NANDs,(me via configurable 3-LU’f, a flip flop and a couple of buffers. Figure I shows ~:his LUT based PLB.

Figure 1: LUTbased PLB for VPGAFabric

2.1

CAD Flow

For a given PLBarchitecture, Figure 2 outlines the flow that weuse to mapan RTL-leveldescription of a designonto a regular array of PLBs.First weuse a restricted library of standardcells to obtain an ASIC-styledetailed placetnent of the design. This part of the flow uses commercialCAD tools with the exception of a logic-compactionstep. Next welegalize this placementby ’packing’ the standardcells into an array of PLBs.Our legalization algorithm workswith a cost function that minimizes perturbation of the AS[C-styeplacement, and thus minimizesany loss of performanceor increase in area at this stage. Finally weperformASIC-styleglobal and detailed routing on this regular array of PLBs.In the rest of this section, wedescribe ,each stage of this flowin greater detail. The restricted library of standard-cells used in~ this flow consists of the component cells of the given PLBarchitecture - for exampleNAND3, 3-LUT,bufli~rs and inverters. The library is further restricted in that each componen!cell has a fixed :dze whichis chosen to give a goodpower-delay tradeoff. This corresponds to the size of that componentcell in the PLB.The timing information for this library is generated by characterizing these cells using a commercialtool called CellRater from Silicon Metrics [14]. Weuse Design Compiler from Synopsys to do logic optimization and technology-mapping to this restricted library. Technology-mapping is followed by a compactionalgorithm that reduces the area of the netlist by better utilizing the given PLBarchitecture. Ouralgorithmfirst finds clusters of logic or "supernodes’ correspondingto functions with 3 or less than 3 inputs. This is done using a maxflow-mincut algorithm similar to Flowmap[9]. It then ~natches these computedsupernodes to the appropriate combination of PLBcomponents.This allows more logic to be collapsed into PLBs. For both the PLBarchitectures tha~: weconsidered, this compactionstep resulted in a significant reduction in total gate area of about 15%on the average. Wethen use a commercialtool called Dolphin from MontereyDesign Systems [15] to perform physical synthesis and placement.The result of this stage is a detailed ASIC-styleplacementthat has been optimized for performance, area and routability based on physical information. The resulting netlist includes logic changesand buffer insertion to meettiming constraints and area specifications.

Cell Characterization (SM- Cellrater)

PLB Components (LUT, DFF, NAND3...)

Timinglibrary ,~ Synthesis, Mapping ~ (Design Compiler)

RTL

Regularity driven Logic compaction

Physical synthesis (Dolphin) ASIC Placement

Packing into array of PLBs

PLB Array ASIC Routing (Dolphin)

Figure 2: CAD-flowfor an ASiC-routedregular PLB.-array

After obtaining this ASIC-styledetailed placements,our next step is to ’legalize’ this by packing the componentcells itnto a regular array of PLB~;.Our packing algorithm does this by recursive quadrisection. At each quadrisection level, the component cells are relocated to other regions of the chip dependingon the availability of the corresponding resource. For example, if there are more 3-LUTsin a region of the chip comparedto ~:he resources available in the PLBsin that region, some 3-LUTswill be movedto the nearest region of the chip that has unused3-.LUTsavailable. The cost function used in this algorithmtakes into consideration the criticality of lhe cells being movedand also tries to minimizeperturbation of the ASIC-styleplacement. In order to further minimizethe loss in performancedue to the moticmof the componentcells, weuse the packing algorithm in an iterative loop with the physical synthesis tool Dolphin. In each iteration,, the packingalgorithmrestricts the locatiions of someof the components to regions of the chip thai: have unusedresources available. Physical synthesis is repeated with these restrictions to producenewlocations for the remainingcomponents,and tc. also redo the buffer insertion and logic restructuring wherenecessary. This iteration loop is re, peateduntil all the .componentshavebeenallotted legal locations in the PLBarray, andensuresthat the pe, rformancede:!;radation due to ~egalizing the ASIC-style placementis minimal. After legalization, we use Dolphinto perform ASIC-stylecustomglobal and detailed routing on the regular array of PLBs.Wemeasurethe final performanceof the design by running static timing analysis in Dolphinwith data from post-layout ext:taction. 2.2 Results Weusedi the flow describedin Section 2. i to implementdiffet’ent designsonto an gate-array of regular PLBs.In this section we comparethe area and performance, of gate-arrays using the heterogeneous LUT-basedarchitecture shownin Figure 1. Wepresent cornparisons for four different designs. The designs ALU, FPU,and Networkswitch are dominated by datapath, while the design Firewire is a small controller that is dominatedby control logic. Table1 showsthe’, comparisonof the final die--area and timing respectively, lbr each of the following design flows:

Design

Area (sq microns) I Av.(Top 10) 1 Slack(ns) Arithmetic Logic Unit (651 Gates) ASIC 5600 -0.45 ASIC (PLB Components Lib) 7800 -0.302 PLB Array 18225 -0.308 Firewire Controller(4247 Gates) ASIC 27027 - 1.31 I ASIC (PLB Components Lib) 40944 -1.44 PLB Array 56250 - 1.45 ] Floating Point Unit (24k Gates) ASIC 409600 -7.680 I ASIC (PLB Components Lib) 423444 -8.794 PLB Array 784995 -9.08 I Network switch (80k Gates) ASIC 1752048 -2.521 ASIC (PLB Components Lib) 2088025 -2.671 PLB Array 3294:225 -3.09 PLB Array (2 Lut’s per PLB) 2863:500 -3.181

Av. Slack (ns) (All) -0.431 -0.266 l -0.2794 -1.27285 -1.35654 -1.67535 -3.90513 -3.92062 -4.78379 -1.9373 -2.2288 -2.4916 -2.6434

Table 1: Standard Cell vs. PLB-Array ASIC is the Standard Cell ASICflow using a commercial. 13u Library from ST Microelectronics. ASIC (PLB ComponentsLib.) is obtained it we skip ~he Packing step lbr the design flow described in Section 2.1. Essentially, it is the standarcl cell ASICflow using a library which comprises of cells that make up each PLB. PLB Array is the design flow described in Section 2. l.This produces a regular PLB ancay with ASlC-style custom routing. Weshow the final Die area, and the worst slack averaged over the top-lO and all the paths in the desi.gns. The gate count for each design is given in units of equivalent 2-input Nand gates. The ASIC (PLB Components Lib) and PLB Array implementations

have an area overhead compared to

ASIC because the PLB-library elements (LUTs) are larger than the corresponding standard cells. Furthermore, for the PLBArray, all the elements ,of a PLBare not fully ~atilized. ALUdoes not use any Flip-Flops.

For example, the

The AS1C(PLB Components Lib) and ASICimplementations

have

very comparable performance, demonstrating that the PLB-library is competitive in terms of delay. 10

The PLB Array has degraded performance dne to cell movementduring packing as well as increased chip area. Results indicate that the componentsto be chosen for the PLI3 are dependent on the applicalion being mapped. For example, we mapped the Net~vork Swilch to a PLB-array where each PLB .contains 2 LUTs. This design had substantially

less die-area.,

combinational elementts per PLB.

11

indicating

the need for a higher number of

3

Regular Routing Models

ASICsemploya fully customizedrouting modelthat requires complexdesign rules and applicationspecific masksfor each routing layer~ This routing fle~fibility is potentially at the cost of systematic yield failures. Moreover,while total wirelength is minimizedas comparedto a morerestricted routing style,., this can often result in a high numberof interlayer vias due to the meanderingnature of the shortest wiring paths. This problemis becomingworseat nanoscale, wheretotal wiring can often dominatetotal area, andpredicting congestio:aa priori i~s extremelydifficult. Moreover,as via failure rates and parasitic resiistances becomingincreasingly dominantfor shrinking feature sizes, this lack of wiring predictability becomesincreasingli~ proble~atic fl)r timing closure and design lot manufacturabilit~¢. Structured ICs, such as FPGAswhichare an extreme exampleof this category, employ a fixed set of metal and via masksthat are field proi:.,,rammable using SRAM programmable switches. Thesefi:~:ec~ structures provide regular repeating patterns in each of the metal masklayers whichallow for extensive optimization and tuning for manufacturability and control of systematic failures. Therefore, wewill first consider the performztnceand area penalty associated with a similarly "structured" routing architecture that also providesregular metal patterns anddensities yet offers ASIC-like configurability. 3.1

Structured

Routing

Figure 3 showsan exampleof a fully structured r.outi~ag modelas it appears on a gridded template. Thevou~Ier uses resources from the underlying templz~teand is free tc, "cut" metal lines to desired lengths as required by the application. Altho~:tghthiis producesfully customizednon-repeatingshapes in the rc, uting layer, the router is based on rules whichdo not allow it to produceundesirablestructures. Theepieces of metal whichare not used for routing remainin the mel~almask,thus producinga uniform metal density. Referring to the grid template in Figure 3, each grid-point is a "potential via" site. Therouter is restricted to workon a strict grid, wherebywrong-way wiring and arbitrary spacing and sizing are prohibited. Parameterssuch as metal overhanga! via sites and relative p,,~sitioning of vias are pre-

1:2

Required length, on-grid

Metal overhang at viasite

Unutilized~ resou rc e s--,...._~Z../Figure 3: Structured routing ,an a grid template. conligured. "Potential vias" at consecutive rows or columns are prohibited in our model to allow for more compactgrid-spacing. Clearly, the packing density of the routes, as well as the total interconnect resistanoe and interconnect capacitance will be worse {han those for the ASICrouting model. As a result there is some amount of performance penalty incurred when using this model. This loss in performance, however, is traded-off against the in~prow:ment in manufactu~:ability. 3.2

VIA Configurable

By further restricting

Routing

the structured routing modeland not allowing the router to "cut" metal lines at

arbitrary locations, we obtain the via configurable ~’outing model. The only freedom that the router has is selection of vias to complete .:~ route. This additional routing structure clearly inctars additional performance and area penalties.,

but provides two important advantages: l ) application-specific

customization for low volume products requires only a set of customized via masks, or potentially provides for direct write configurability [131; 2) The numberof via unique via patterns that :must be printed can be better controlled and optimized for printability.

Wewill further showin the following

section that adding redundant vias can be handled more effectively however, there is a decrease in routing flexibi]ily,

as well. With fixed metal masks,

along with a penalty of dangling metal capacitance.

This extra capacitance results in an additional performance degradation l:hat we will assess in the results section. Figure 4 shows an example of via co~afigurahle routing architecture.

The basic re-

peating unit is an alternating set of short and long metal tracks with a se~: of jumpers at the end of

each track. Each overlap of M3and M4tracks is a "pomntial via" site. M3tracks also serve as access tracks for pins on each gate which are in M2. Each repeating unit can be viewed as a switch box with tracks entering and leaving. The jumpers provide for paths to enter the switchbox and continue in the same direction. Switching a track orthogonally within lhe switch box is costly in terms of the dangling capacitance incurred at the source track being switched.

Jumpers Figure 4: Repeating unit for a via conligurable fabric. The granularity

of this structure

is determined by the parameters Wand H. Choosing a large

value for these parameters (lower granularity)

cau:ses a higher dangling capacitance when switching

tracks orthogonally but provides less resistive of Wand H (high granularity)

paths in the same direction.

Choosing low values

reduces the dangling capaci|ance but increases the resistance of the

paths since more vias are used while routing. Thus the granularity particular application domain and process. Architecturally,

can be optimally chosen lk)r

it is better to have this structure with

higher granularity in the lower layers (M3& M4)and nave a similar structure with lower granularity in higher metal layers; (M5 & M6). Routing that is relatively

local can be done in M3and M4and

longer routes can be done in M5and M6. The "potential

vias" at the jumper sites are more, often used during routing than at the other

sites. They also form potential via failure locations due to long metal lines terminating in vias. The

"

1

structure

shown in Figure 4 can be easily extended to accommodate redundant vias at the jumper

sites, as shownin Figure 15. The packing density of the tracks in the structure with redundant vias is only slightly lower than the one without redundant vias,

Redundant Via Sites Figure 5: Structure in Eigu~ce 4 with redundant vias.

4

A Regular Routing Framework

The proposed structured

routing and via configurable routing architecture

ASICand FPGArouting models on the axes of flexibility,

manufacturability

models lie in-between and performance. We

represent both routing models in terms of a resource graph which is similar to the routing resource graphs u:~ed for FPGAs.The difference

being that the resource graph lies on top of logic blocks

instead of within dedicated routing channels. Moreow~r,the switch density as well as the packing density o:f the routes is muchhigher for via patternable regular fabrics than for FPGAs,and more comparable to that for to ASICs. Therefore, directly

applied.

routing methods used in ASICs or FPGAscannot be

For this reason we have developed a :routing

framework which accommodates the

proposed routing models and allows us to compare their performance with that for ASICs. Yhe generalized routing architecture (resource graph) is described using an ASCIIdescription language. 4.1

Routing Architecture

Description

Along with the placement of cells in the design, the router takes in an architecture description file. This file describes the metal structures that are available for routing. The architecture description language has primitives such as Rectangle, Polygon to define rectangular wires as well as polygon suitable [br Manhattan style routing. The Via and PVia primitives

are used to define a via and a

"potential via" respectively. Sections are used to group various primitives and instances of other Sections. Sections can be parameterized with architectural

parameters such as number of wires spanning

various lengths, number of redundant vias etc. This provides a convenient way of changing various parameters and observing the effect on the overall performance. The Ins~antiate statement can be used to create instances of Sections and Primitives in a certain layer a~t a desired coordinate:. Loop statements are used to instantiate

defined Sections and Primitives repeatedly. Such an architecture

description can be used to define a complex metal s~tructure suitable for the ’via programmablemodel, or a simple grid/or the structured routing model.

4.2

Routing Flow/Algorithm

In order t,o comparethe perl-k)rmancecharacteristics of the proposedroutir~g modelswith standard ASICstyle routing, we developed a routing tool whichaddresses the requirements of the proposed routing models. Figure 6 showsthe flow within the: proposedrouting tool. Weuse a rip-up reroute scheme1.4] to resolve the order in which the nets should be routed. The routing architecture is described using the architecture description language. This architecture description is abstracted out into a resource routing graph whichuniquely represents the architecture. Metal wires are abstracted as nodes, "potential vias" are abstracted as edves,., in this graph, Finding a path betweenany two nodes in tl~is graph corresponds to completinga connection betweenthe: correspondingmetal wires by selecting a set of "potential vias". Resourcegraph

I

I

Globalrouting (Dolphin)

Initialize, bounding boxesandnetviolationcosts

Foreach net: Min-costsearch within the bounding box

Rip-up reroute

Refinebounding boxconstraintsand net-violationcosts Figure 6: Routingflow.

17

Underl,ng Metal Architecture

The routing is broken into a global routing and a detailec{ routing phase. Weuse the commercial physical synthesis tool "EIolphin" from Mon~IereyDesign Syslems citedolphin to perform the array placement and generate the global routing information. The global routes are used to create bounding boxes for each net. Figure refdetailrouting

show:s; the bounding box for the global route which is

overlaid on the resource graph. Each bounding box represents an area within which the corresponding net where it would preferably appear after detailed

routing. Along with the bounding box, a net-

violation cost is defined for each net. This cost represents the penalty incurred by another net if routed using any of the resources used by this net. The initial

values ,of the net-violation costs are

same of each net. The router proceeds by finding a min-cost embedding for each net wi~Ihin the bounding box defined for it. For a two point net, this embedding reduces to finding a minim~amcost path between the source and the destination

and the path must lie within the bounding box. For a

multipoint net, a minimumcost path between a node and any of the nodes :in a set of nodes has to be found. The/actors contributing to this cost function for embeddingeach net are the total re:~istance and the capacitance of the embeddednet as well as the ne~-violation penalties of nets it violates. Following lhe embeddingof each nel, the number of violations for each net is calculated. Based on this numberand the criticality iteration

of the net, the boundiing box and violation cost is recomputed Another

of embedding the nets is carried out using the newly computed bounding boxes and net-

violation costs. The router stops if all the embeddednets are free of any violations or a maximun~ number of iterations

is reached. The above scheme, when used with the structured

routing model,

allows the router to create new nodes in the resource routing graph in order to model "slotting"

of

wires at desired places;. This feature of the router i>; switched off for the via configurable model. 4.3

Experiments

& Results

Using the routing flow in Figure 6, we co~npared the performance of the standard ASIC routing model with structured and via configurable routing. We: further explored the advantages of redundant vias at the jumper sites of the via configurable routing fabrics, as shownin Figure 5. Webegin with an RTLcircuit description and produce the placement in the form of a regular array of logic blocks, as descritbed in Section 2 above. Each logic block: consists of a Look-up I:able, two NAN[)gates,

Flip-flop and a pair oi’ buffers. Each placed design is routed using the structured model ai~d the via configurable model using our routing framework. The configured metal architectures back into "Dolphin" to facilitate

netlist

are incorporated

checking and post-routing timing reports. The ASICrouting

example is generated using "Dolphin’s" standard flow. To produce a regular placement of the RTLnetlists, tion 2. Weused a six-Metal,

we used the design-flow discussed in Sec-

130 nanometer comme~:cial process from ST Microelectronics.

The

structured routing was set up for a uniform grid ow::r the entire routing space in each of M3.. lVI4, M5 and M6. For the via configurable model we used the architecture

described in Section 3.2. The W

and H values for the architecture were carefully’ chosen to provide a good eugineering trade-off point. Figure 7 shows the performance plot of a firewire controller

which was routed using different values

of W(H was chosen to be’, equal to W). Large values o1~ W(low granularity) cause a higher dangling capacitance when switching tracks orthogonally,

lhus degrading the performance. Lower values of

W(higher granularity) reduce the dangling capacilance but increase the resistance of the paths due to large number of vias again degrading the performance. Thus an optimal value of We~ists for minimal delay. As expected, via count monotonically decreases as the value of W increases.

For

this reason we selected a value of Wwhich provided a good via count without incurring a substantial delay penalty. Table 2 compares the proposed routing models wit~ the ASICrouting model. Each of the benchmarks was routed with the structured

model, the via configurable model {Via cfg -I, Figure 4) and

the Via configurable model with redundant vias (Via cfg-II,

Figure 5). Interconnect

Capacitance

and Inte:rconnect Resistance denote the total intercc, nnect capacitance and the resistance of the routes respectively. Weinclude the average slack of the top tea critical paths in the routed design. T]he slack is measured with respect to a cycle time of .5ns. The "’Non-Redundant Via Count" is the total nonredundant vias in the design. The last column in Ihe table shows the RandomYield loss due to Via failures.

Wecalculate the yield loss due to via failures using an exponential failure model for which

yield loss is modeled by Exp(-a N); where a is the failure rate in parts per billion and N is number of non-redundant vias in design. The performance of tlhe Structured and Via configurable romed designs is comparable to the’, ASICrouted designs. The via cor~figurable fabric has higher interconnect

Firewire Controller 2030

.

50000

2020 40000 2010

3500045000

2000

30000

2500o

~

1990

20000 15000

1980

10000 1970

5o00

1960

0 4

5

6

7

8

W (microns)

9

10 11 = Average Delay --~,-- Via Count

Figure 7: Performance and Via-count vs. Granularity capacitance and resistance The higher capacitance is due to the dangling ends of the metal wires used for routing The higher resistance is mainly due to the a high via count. On an average, l:here is a 13.2% degradation in pedbrmance for the via configurable routed designs. The FPUsuffers from a large performance degradation (5.1% for the Slruclured and 23.6% for the Via Configurable) due its long ][ogic depth of the critical path. The "Via cfg - II" architecture is particularly interesting since it has a significantly

lower non-redundant via count, tln:us, improving the yield. The "potential vias"

at the jumper sites are more often used than at othe.r sites. On an average, I:hey constitute about 50% of the total vias in the design. A large numberof r,edundant vias in the "Vi:a cfg - I|" designs lead to the lowering of interconnect resistance and the improvement in pertbrmance over the "Via ctg - l" architecture. Figure 8 and 9 showpost-detailed-routing

plol:s of the firewitre controller routed using

the "Via cfg - I" (Figure 4) and Standard ASICmodel respectively.

The via configurable architecture

provides an extremely regular set of masks which pro,~es to be much better from metallization

poim

of view. Part of our future work will be to predict the improvementin long-range systematic failures due to the regular metal patterns and controlled metal wiring densities.

2O

Design

Interconnect Capacitance (pF)

I

ASIC Structu:red Via cfg - [ Via cfg - II

11.14 12.95 19.58 19.6

ASIC Structured Via cfg- 1 Via cfg- II

27.89 29.43 38.19 38.61

ASIC Structured Via cfg - I Via cfg - II

311.07 362.67 598.36 607.24

ASIC Structured Via cfg - [ Via cfg - I I

1828.9 2273.4 3924.1 4001.3

Interconnect Average Non-Redundant Resistance Slack (1-10) Via Count (Kf~) (ns) Arithmetic Logic Un# (614 nets’) 37.81 -0.301 3163 .-0.3()1 49ll 41.81 90.04 -0.~,1 12017 -0.~1 4110 76.33 Firewire Controller (1783 nets’) 97.13 - 1.45 15560 102.01 - 1.4.56 17631 164.31 - 1.55 26775 144.33 - 1.54 16410 Floating Point Unit (25322 nets’) 1003.5 -9.08 132K 1112.9 -9.514 172K 339K 2185.4 -11.22 -11.16 158K 1835.6 Network Switch (64149 nets) 4854.3 -3.09 564K 767K 6389.1 -3.28 144K -3.76 1384Ki -3.67 719K 131K

Random Yield Via Limited

0.999955 0.999931 0.999832 0.999942 0.999782 0.999753 0.999625 0.999"77 0.998154 0.997595 0.995265 0.99779 0.992135 0.989319 0.980811 0.989984

Table 2: ASIC, Structured and Via configurable routing

As the results demonstrate, the random yield loss penalty due to increased number of vias is negligible, especially if the redundant via re-configuration schemeis appliecll (Via cfg - II). The overall yield should be significantly

higher though, since the systematic and parametric yield componentsare

maximizedby using the regular metallization structures.

Moreover, this wil~l also provide muchmore

robust designs since there will be less variabil, ity in the printed feature dimensions and topography as a function of process windows.

21

Figure 8: Fivewire controller- Regula:~ Pl~cemem.ViaContigurable routed

22

Figure 9: Firewive controller-

Regular Pl~ce~aent,ASiC routed

5

Conclusions

In this work we developed a CAD-flowfor a regular sea-of-PLB style placement and studied the performance and area tradeoffs

vs. regularity.

Wel:,resented

two routing models which address

the growing need of regularity

in design motivated by the challenges involved with nanoscale IC

manut;acturing. Using the router that we developed, we explored various regular metal archi~tectures and compared them with ASICrouting. Figure 10 shows the performance and the area characteristics of various application specific IC design methodologies explored in this. work.

Network Switch (80K Gates) Area and Performance PLB-Array Via Configurable Routing

200 180 160 140 120 100 80 60 40 20 0

PLB-Array ¢~’AS IC Routed

PLB-Array Structured Routin~

Standard Cell ~¢~ PLB Library ~’Standard Cell SI- Library

-4

-3

-2

-1

0

Average Slack (Top 10 paths)

Figure 113: Area and Performance ior various design methodologies

The performance penalty for the PLB array designs is modest compared to the Standard cell designs although there’, is a substantial area penalty due to un-utilized resources in the PLBs. A large "die-area" affects the feature limited yield due to random faults.

Wehope to offset this by a good

systematic yield due to high structural regularity in the design, thus the numberof good dies per wafer is still high. ¯

24

6

Acknowledgements

I wouldlike to express mymost sincere gratitude to myadvisor, LawrenceT. Pileggi. Larry, thank you for your unfailing guidanceand support, in technical workand in other aspects of life. l’~pecial thanks also go to Andrzej Strojwas, HermanSchmit and Ruehir Purl from IBMT.J. ’Watson Research Center. Thankyou for co-advising me. VPGA ~s a large project, and I am fortunate 1o have the opportunity to work with a dynamic team of fiAlow graduate students, Padmini Gopalakrishnan. Kim YawTong, Aneesh Koorapaty, Chetan Patel, Vyaeheslav(Slava) Rovner, and R.. Reed Taylor. Thankyou for your contributions to various aspects of the VPGA project, and most impo~:tantly, thank you for your friendship. Last but not least, I wouldlike to thank myParents for everything.

¯

25

References [1] B. Tyrrell, M. Fritze, D. Astolfi, R. Mallen, B. Whee].er, "Investigation of the physical and practical limits of dense-only phase shift lithography for circuit feature definition," Phoa~o-Optical Instrumentation

Society

Engineers, Octobe>2002.

[2] E. Ahmedand J. Rose, "The Effect of LUTand Claster Size on Deep-Submicron FPGAPerfof mance and Densi’¢y," ACMInternational 13] M. Palusinski,

Symposium on F.PGAs, 2(2100.

A. J. Strojwas and W. Maly, "l?,egularity

in Physical l)esign,"

GSRC~brkshop,

Las Vages, NV, June 17-18, 2001. [4] PathFinder:

A Negotiation-Based

Performance-Driven

Router for FPGAs(11995) Larry Mc-

Murchie, Carl Ebeling [5l www.xilinx.com [61 www.easic.com [71 www.lightspeed.com [8] K. Y. Tong and xL Kheterpal and V. Rovner and L. Pileggi and HI. Schmit and R. Puri, "’Regular Logic Fabrics for Via Patterned Gate Array (VPGA)," 1EEECustom Integrated

Circuils

Co~ference, January, 2003. [91]J. Cong and Y. Ding,. "Flowmap: An Optimal Technology Mapping Algorithm for Deiay Optimization in Lookup-Table Based FPGADesigns," IEEE Transactiona on Computer Aided Desig~! ~¢’lntegrated Circuits and Systems, January, 1994. [101 B. Ty~ell, M. Fritze,

D. Astolfi,

R. Mallen, B. Wheeler, "investigation

of the phy~dcal and

practical limits of dense-only phase shift lithography figr circuit feature definition," Socie~.’ Photo-Optical Instrumentation Engineers, Ocrcober-2002. 11 ] A..1. Strojwas, "Process-Design Interaction Modeling Based Design For Manulacturat>illity, torial," ¯

Design Automation Conference, June ’2003. 26

Tu-

[12] V. Betz, J. Rose, and A. Marquardt Archi.tecture Academic Publishers,

ar~d CADfor Deep-Submicron FPGAsKluwer

1999

[13] Gonzales, AnthonyJ.; Freyer, Jorge I_.; Fok, Samuel S., "Recent results in the application of e-beam direct-write lithography," Proc. SPIE l,bI. 1089, p. 374, 19!;)6. [14] www.siliconmetrics.com/Products/CR.asp [15] www.mondes.com/products/dolphin.htm [ [ 6] J.S. Kilby, "Miniaturized electronic circuits," US Patent 3.138, 743., 23 June 1964. [ 17] G.E. Moore, "Crammingmore compone~tsonto integrated circuits,."

Electronics, Vol. 38, No. 8,

April 1965. [ 18] L.W. Liebmann, "’Layout impact of resolution enhancement technique, s: impediment c,r opportumty’~, ’’ Proc. oJInternational Symposiumon Physical Design, Apr 2003, pp. 110-11’7. [191 L. ]:’ileggi,

H. Schmit, A.J. Strojwas, P. Gopalakrishnan,V. Kheterpal, A. Koorapaty,C. Patel,

V. l~’,ovner, K.Y. Tong, "Exploring regular fabrics to op~:imize the performance-cost trade-off," Prec. of Design Automation Conference, June 20133. [20] W. Maly, "IC design in high-cost nanometer-technology era," Proc. o¢Design Automation Conference, June 2001, pp. 9-14. [211 RS. Zuchowski, C.B. Reynolds, R.J. Grupp, S.G, Davi~;, B. Cremen, and B. Troxel, ".zk hybrid ASIC and FPGAarchitecture,"

Proc. of International

C(mference on Computer Aided Design,

2002, pp. 187-194. [22] M.J.S. Smith, "Application-Specific

Integrated Circuits;,"

Addison ~esley Longman, Inc., Feb

1999. [23] S. Brown, and J. Ro:~e, "FPGAand CPLDArchitectures : a tutorial," pulers, Vol. 13, Issue 2, 1996, pp. 42-57.

¯

27

Design and Tes; of Com-

[24] Z. C)r-Bach, Z. Wurman,R. Zeman, and [~. Cooke, "Customizable and programmable cell array," US Patent 6,331,790, 18 Dec 2001. [25] L. Pileggi, H. Schmit, J. Shah, Y. Tong, C. Patel, and V. Chandra, "A Via Patterned (_3.ate Array (VPGA),"Technical Reports Series oJ the CMUCer, ter for Silicoti~

System Implementation,

No. CSSI 02-15, Mar 2002. [26] C. Patel, A. Cozzie, H. Schmit, L. Pileggi, "An architectural Arrays," Proc. of International [27] R (,’how, S.O. Seo, .I.

exploration of Via Patterned Gate

Symposium on Physical Design, Apt 2003, pp. 184-189.

Rose, K. Chung, G. Paez-Monzon, and I. Rahardja, "The design of

SRAM-based field-programmable gate array -- part II: circuit design and layout," IEEE 7"rans. on VLSI ~vstems, Vol. 7, No. 3, Sept 1999, pp. 321-330. [28] K. Yano, T. Yamanaka,T. Nishida, M. Saitho, K. Shi:mohigashi., acid A. Shimizu, ’A 3.8’~.s CMOS16 x 16 multiplier

using complementary pass transistor

logic,"

Proc. of Custom lnte-

gratea Circuits Conference, May1989, pp. 15-18. [29] ES. Lai, and W. Hwang, "Design and implementation of differential with pass-gate (DCVSPG) logic for high-perI;armance digital

cascode voltage switch

systems.," 1EEEJournal of Solid-

State Circuits, Vol. 32, No. 4, Apt 1997, pp. 563-573. [30[ D. Sylvester,

and H. Kaul, "Future Performance Challenges in Nanometer Design," Proc. oJ

Design Automation Conference, June 2(;10 I, pp. 3--8. [311 R. Gonzalez, B.M. Gordon, and M.A. Horowitz, "Supply and threshold voltage scaling for low power CMOS,"1EEEJournal of Solid-State

Circuits,

Vcl. 32, No. 8, Aug 1997, pp. 1210-1216.

132] K. Usami, and M. Horowitz, "Clustered voltage scaling technique for low-power design," Proc. of International ~ymposiumon LowPower D,’-_’sig~, Apt 1995, pp. 3-8. codec core exploiting voltage [331 K. Usami, et al, "Design methodologyof ultra low-powe, r MPEG4 scalting techniques," Proc. of Design Automation Confe,-ence, June 1998, pp. 483~t88

[34] A. Koorapaty, V. Chandra, K.Y. Tong, ,C. Patel, L. Pileggi, programmablelogic block architectures,"

and H. Schmit, "Heterogeneous

Proc. of Desig~t, Automation and Test in Eur~pe, Mar

2003, pp. 1118-1119. [35] R. Tayior, K.Y. Tong,H. Sch~nit, and L. Piieggi, "(:;ate ArrayVoltageScaling (GAVS):,,~nabling energy-efficiency in via-patterned gate array devices," Submitted to, ICCAD2003.

29