(Cl +Cp)(VTH)âST TR. ION. (5). Here, Cp is the parasitic capacitance of the inverter seen at its output node and ION is the NMOS device ON current. We observe ...
Efficient Nanoscale VLSI Standard Cell Library Characterization Using a Novel Delay Model Sandeep Miryala, Baljit Kaur, Bulusu Anand and Sanjeev Manhas Department of Electronics & Computer Engineering, Indian Institute of Technology Roorkee, Roorkee, India
Abstract— Accurate estimation of delays in Static Timing Analysis (STA) using Non Linear Delay Model (NLDM) based Look Up Table (LUT) is a major challenge in nanometer range VLSI circuits. Issues with NLDM based LUT are mostly due to the arbitrary choice of input signal transition time trin and load capacitance (Cl ) and the large number of simulations to be performed for characterizing an entire standard cell library. In this paper, we present a systematic method to reduce standard cell library characterization time significantly. For this purpose we propose and use a simple and physically reasonable logic gate delay model in which delay varies linearly with Cl and trin . We also determine its region of validity in the (Cl , trin ) space. We express the delay model coefficients and its region of validity as a function of inverter (or logic gate) size. We do not use device current/capacitance models in our work and hence the method is general enough to be used with scaling. With the help of this new model proposed, We were able to save approximately of 51% SPICE simulations during the standard cell library characterization. We observe that the delay obtained using our LUTs is as accurate as that of the delay obtained through traditional LUTs.
I. I NTRODUCTION Static Timing Analysis (STA) is an essential and critical step in VLSI chip design flow. It is a fast methodology of finding data-path delay in short time but with out resorting to SPICE simulation. The accuracy of STA depends on the accuracy of the delay model used[1]. Non-Linear Delay Model (NLDM) based Look Up Table(LUT) is widely used in industry for estimating delay in STA[1]. In the LUT approach delay is obtained using SPICE simulation for a fixed number of input transition time trin and load capacitance Cl values[2]. Delay for other values of (Cl ,trin ) is obtained using linear interpolations. The table values are large(about 100) for nanoscale CMOS technologies. This increases the time and resources for standard cell characterization greatly. This issue is critical because because standard cell characterization is done at several Process, Voltage and Temperature (PVT) corners due to process variations. Many researchers have proposed delay models (DM) for CMOS inverters and logic gates. Some of these do not take variations in trin into account[3]. Some of them take trin variations into account but are complicated and must be extracted for each logic gate separately. Most of these are based on transistor current model and therefore their validity with transistor scaling is not guaranteed [[4][5][6][7]]. Due to these reasons it is hard to use them in standard cell characterization.
We observe that delay of a logic gate (Fig. 1a) is a linear function of its input transition time trin and load capacitance (Cl ). In this paper, address the problem of avoiding redundant simulations in library characterization in this region of variation of delay with trin and Cl . We develop a physically based delay model for an inverter as a function of Cl , TR 1 and the size of the inverter (represented henceforth in this paper by its NMOS device’s width Wn 2 ). This semi-empirical model is a linear function of (Cl ,trin ) and is valid in a range of (Cl ,trin ). We also show that the bounds of region of validity of the model as a simple function of size of the logic gate (Wn ). We obtain the coefficients of the delay model using HSPICE simulations. Within the region of validity of the model, delay values in the LUT need not be obtained using simulations, they can be obtained by model proposed thereby saving precious time in library characterization. The paper is organized as follows: In Section II, we describe the proposed delay model. In section III, we discuss the region of validity of the delay model. In Section IV, we verify the model using HSPICE. In Section V, we describe the proposed semi-empirical model for the output transition time Trout for an inverter and verify it using HSPICE. In Section V, we prove that the model is valid even with technology scaling. II. L INEAR DELAY MODEL In this section, using physical reasoning we show that delay varies linearly with Cl and TR when these parameters are within a certain range. We do the analysis for an inverter with its output being discharged as shown in figure 1(b). In this paper, the word “delay” stands for 50% delay. We derive the delay delay model for the case of rising transition of Vin . The same derivation can be easily extended for falling transition of Vin , as shown by our results. We assume VGS = VTDD t, where VGS is the R gate-source voltage of the inverter’s NMOS device, VDD is the power supply voltage and t is time. We relax this assumption later. We also claim that within the region of validity of our delay model, the NMOS device operates in saturation till Vin reaches VDD . We justify this claim later in the paper. The output discharge comprises of two regions: First, when the input transitions from 0 to VDD and second, when the input voltage Vin =VDD . As we show later in this paper, for a large 1 Through out this paper we use the terms t rin for 20% to 80% of input transition and TR for its 0-100% equivalent. 2 In a standard cell library the ratio of an inverter’s NMOS and PMOS device widths is kept constant.
with Cl as K1 , coefficients associated with TR as K2 and the independent terms as K3 . Delay
(a) Fig. 1.
(b)
number of points in the LUT, the NMOS device is in saturation in the first region. We derive the delay expression for such values of TR and Cl . The output discharge △Q(TR ) from 0 to TR is, ∫ TR
=
Ids dt
0
= TR = TR
∫ 1 0
∫ 1 0
(2)
f (x, y = 1)dx
(3) (4)
GS VDS Here, Ids = f ( VVDD , VDD ) is the NMOS drive current, ST is a constant proportional to Wn and x=VGS /VDD , y=VDS /VDD . The generalized expression of current as a function of VGS and VDS enables us to include the second order effects into the expression. We assume that y= VDS /VDD ∼ = 1 for the NMOS device since it is operating in saturation regime. We assume that PMOS device is very weak when compared to NMOS device due to rising transition at the input node. we justify this assumption later in this paper. The output transition time can be further divided into two regions: a)when the NMOS device is in saturation, △t1 and b) when NMOS device operates in linear region, △t2 . Assuming that in △t1 the output transition is from Vout (TR ) to VDD − VT H (See figure 1(b)),
△t1
=
(Cl +Cp )(VT H ) − ST TR ION
(5)
Here, Cp is the parasitic capacitance of the inverter seen at its output node and ION is the NMOS device ON current. We TH is inversely observe that in Equation 5 coefficient of Cl i.e VION proportional to Wn . Since Cp , ST and ION are proportional to WN , the remaining terms are independent of Wn . We now consider the discharge from VDD − VT H to VDD /2 in △t2 . In this region the NMOS device operates in linear region and the output node can be assumed as an RC network. Hence, △t2 α (Cl + Cp ). Therefore, total delay from Figure 1(b) is equal to TR + △t1 + △t2 (6) 2 Making use of expressions derived for △t1 and △t2 in the previous paragraphs, we group the coefficients associated Delay
=
III. R EGION OF VALIDITY OF LINEAR DELAY MODEL In this section, we derive region of validity of the assumption that the NMOS device is in saturation from 0 to TR . From this saturation condition of MOSFET and Equation 4, We can write,
(1)
VGS VDS ∼ VGS f( , ) = 1)d( VDD VDD VDD
= ST TR
(7)
where K1 ,K2 and K3 are constants which are extracted by fitting the model in the HSPICE simulation data. Further we make the following observations: ∙ Observation 1: K1 and K3 are linear functions of 1/Wn ∙ Observation 2: K2 is independent of 1/Wn . In the next section, we discuss the region of validity of this semi-empirical delay model (Equation 7).
(a) CMOS Inverter (b) I/P and O/P waveform of an inverter
△Q(TR )
= K1Cl + K2 TR + K3
△Q(TR ) = ST TR ≤ (Cl +CP )VT H
(8)
For a given value of Cl , linear delay model of Equation 7 is valid for all the values of TR which satisfy Equation 8. We denote the maximum value of TR which satisfies Equation 8 as trb . From Equation 8, trb is a linear function of Cl . We extract the slope and intercept of this linear function by fitting in SPICE simulation data. We observe from Equation 8 that ∙ Observation 3: The slope of trb versus Cl plot is proportional to 1/Wn . ∙ Observation 4: The intercept is a constant with Wn . This is because ST α Wn and Cp α Wn . Using Equation 8 and a similar analysis one can derive the corresponding maximum value of Cl which Equation 8 holds. We denote this value of Cl as Clb . In the next section, we discuss the verification of linear delay model of Equation 7 and the region of validity expression given by Equation 8. IV. V ERIFICATION OF LINEAR DELAY MODEL In this section we validate the results of sections II and III using HSPICE simulations. We also extract coefficients of Equation 7 using the simulation data. We use a 45nm PTM CMOS technology model files3 in these simulations. We simulate inverters with Wp and Wn adjusted such that the rise and fall transition times are equal. Figure 2(a) is a plot of simulated delay versus trin for several values of Cl . We use the symbol trin for 20% to 80% transition at input of logic gate. We show that Equation 7 fits well on data with an upper bound on trin , We also validate the two predictions made in section II. First is that the slope of Delay Vs trin plot (K2 ) is independent of Wn . second is that the intercept of Delay Vs trin plot (K1Cl + K3 ) varies linearly with 1/Wn . The Figures 11(a)11(b) verify Obs1 and Obs2. Now we discuss this upper bound on trin i.e trb . The variation of trb (Cl ) with Cl is linear, as can be seen from the figure 11(c). We show in Figure 3 that the slope (Strb ) and intercept (Ctrb ) of this linear variation of delay with trin are independent and 3 Obtained
from http://www.eas.asu.edu/∼ptm/
100
120
Cl=2f Eqn. 7 Cl=4f Eqn. 7 Cl=8f Eqn. 7
80
100 80
Delay(ps)
Delay(ps)
70 60 50
da(ps)
90
60 40
40 30
trin=100ps Eqn. 7 trin=31.53ps Eqn. 7
20
20
18 20 22 24 26 28 30 32 34 dr(ps)
0 10 0
50
100 trin(ps)
150
0
200
(a) Delay variation with trin Fig. 2. 7
2
4
6
8 Cl(ff)
10 12 14
(b) Delay variation with Cl
Fig. 4.
Delay with actual input Vs delay with ramp input
Points are simulated data and discontinuous lines are fitting of Eqn.
proportional to reciprocal to 1/Wn , respectively. These results validate Equation 8. This also verifies Obs3 and Obs4. As a justification for using ramp input, we observe that delay with ramp and realistic input are linearly related as can be seen from Figure 4. 0.5
Cl=4f
0.4
K1Cl+K3
K2
0.3 0.2 0.1 0 50 100150200250300 Wn(10nm)
28 26 24 22 20 18 16 14 12 10 8 6
Cl=4f
2
(a)
1000 800
STrb
Wp/Wn=2/1 Eqn. 8 Wp/Wn=4/2 Eqn. 8 Wp/Wn=6/3 Eqn. 8
1200
trb (Cl ). In this work, we denote output’s 80%-20% transition by Trout . There are two cases(Case1 and Case2) for output transition: First, where the entire 80-20% output transition occurs after t = TR and second, where a part of the output transition occurs for time t ≤ TR . We analyze the two cases as follows: Case 1:Vout (TR ) ≥ 0.8VDD : The 80-20% output transition occurs after the inverter’s input voltage Vin has reached VDD . Case 1 can be clearly understood from Fig 5(a). Therefore, the
4 6 8 10 12 14 16 -1 10e03 (1/Wn)nm
(b)
1400
trb(ps)
65 60 55 50 45 40 35 30
16 14 12 10 8 6 4 2
Slope
(a) Case 1
0 2 4 6 8 10 12 14 16
600
-1
1000*1/Wn(nm )
(d)
400 200 0 0
1
2
3
4 5 6 Cl(10ff)
7
8
9 10
(c)
Fig. 3. (a) K2 in Eqn. 7 with Wn (b) K1Cl + K3 in Eqn. 7 with Wn (c) trb variation with load capacitance for different sizes d) Strb variation with Wn
Say, for an inverter with a given size we know for load capacitances Cl1 and Cl2 the simulated values of delay for two values of trin each. Say that we also know trb (Cl1 ) and trb (Cl2 ) from simulations. From these values we can deduce K1 , K2 , K3 and trb (Cl ) for any values of Cl for the inverter using observations of section III. V. M ODEL FOR OUTPUT SIGNAL TRANSITION TIME (Trout ) The input transition time of a logic gate in a data-path is the output transition time of its driver stage. Therefore, an LUT of output transition time Trout of logic gates expressed as a function of trin and Cl is also required in standard cell characterization data for STA. In this subsection, we express Trout of an inverter as a simple function of trin and Cl for trin ≤
(b) case 2 Fig. 5.
Output response of an inverter for deriving model for Trout
load (Cl +Cp ) is discharged from a voltage 0.8VDD to VDD − VT H through the NMOS drive current ION . From VDD − VT H to 0.2VDD , the device operates in linear regime. Therefore, Trout varies linearly with Cl . All the parameters Cp , ION are proportional to Wn and the transistor’s equivalent resistance in linear regime is inversely proportional to Wn . Therefore, We make the following observations: ∙ ∙
Observation 5: The slope of variation of Trout with Cl is proportional to 1/Wn . Observation 6:Intercept is independent of 1/Wn .
We validate these observations in Figures 6(a)- 6(b). This also verifies Obs5 and Obs6. 300
200 150 100 50 0 0
5
10
15 20 Cl(ff)
25
30
9 8 7 6 5 4 3 2 1 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 100*(1/Wn)(nm-1)
35
(a)
(b)
Fig. 6. (a) Variation of Trout with Cl before Trb′ for different NMOS widths (b) Slope of variation of Trout
50 cl=2f Constant Eqn. 9 cl=4f Constant Eqn. 9
45 40
11
1
Wp/Wn=2/1 Wp/Wn=4/2 Wp/Wn=6/3
0.8 Coefficient a
160 140 Wp/Wn=2:1 120 100 80 60 40 20 0 0 5 10 15 20 25 30 35 Cl(ff)
Trout(ps)
’
Trb (ps)
We denote the value of TR for which Vout (TR )=0.8VDD by ′ ′ Trb . From Eqn. 8 we observe that 0.8VDD = Vout (Trb )= VDD − ′ ′ ST Trb /(Cl +Cp ). Therefore, Trb varies linearly with Cl . We also ′ observe that the slope of variation of Trb varies linearly and intercept stays constant with Wn . We verify the linear variation in Figure 7(a).
We observe that the coefficients a,b and c varies with load capacitance and width of the device as follows: ∙ Observation 7: Coefficient a is independent of the load capacitance (Cl ) ∙ Observation 8: Coefficient a is independent of width of NMOS (Wn ) ∙ Observation 9: Coefficient b is directly proportional to √ Cl ∙ Observation 10: Coefficient b is inversely proportional to √ Wn ∙ Observation 11: Coefficient c is directly proportional to Cl ∙ Obsservation 12: Coefficient c is inversely proportional to Wn All these observations are to be verified using HSPICE simulations and hence this provides the coefficient values of the model. Fig 8 & 9 verifies the observations 7 to 12. Therefore,
35 30
Coefficient -b
Trout(ps)
Slope of Trout
Wp/Wn=2/1 Wp/Wn=4/2 Wp/Wn=6/3
250
The final expression for Trout after plugging in t1 and t0.8V DD is of the from √ Trout = aTR + b T R + c (13)
0.6 0.4
5 3 2
0 0
20
6 4
0.2
25
(a)
W /W =2/1 10 Wp/Wn=4/2 p n 9 Wp/Wn=6/3 Model 8 Model Model 7
5
10
15 20 Cl(ff)
25
30
1
35
1
1.5
2
0
10
20
30 40 trin(ps)
50
60
3
3.5
4
4.5
5
5.5
6
0.5
Cl
(a)
15
2.5
(b)
70 350
′
Fig. 7. (a)Variation of Trb with Load Capacitance(Cl ) (b)Variation of Trout with trin for trin ≤ trb
Coefficient c
(b)
Wp/Wn=2/1 300 Wp/Wn=4/2 Wp/Wn=6/3 Model 250 Model Model 200 150 100 50
Case 2:Vout (Tr ) ≤ 0.8VDD : This happens for values of TR > ′ Trb , as can be observed from the discussion on Case 1. The response waveform for this case is shown in Fig 5(b). From the figure we can write, Trout
= TR − t0.8V DD + t1
(9)
In Equation 9, t0.8V DD is the time by the output response of inverter to discharge from VDD to 0.8VDD and t1 is the time for the output to discharge from Vout (TR ) to 0.2VDD . During the time t1 NMOS will be in linear regime, hence it will be a RC network where the time t1 is given by Equation 10 t1
= α ReqC
(10)
Now we find t0.8vDD making use of the expression given by Equation 12. 0.2VDD (Cl +Cp )
=
∫ t0.8V DD 0
= µn Cox
0 0
5
10
15 20 Cl(ff)
25
30
35
(c) Fig. 8. Variation of Trout model coefficients in case2 with the load capacitance which verifies Observation 7, 8 and 9
as with delay, it is sufficient to obtain Trout versus trin data for two values of Cl for a given inverter size using SPICE simulation. Using this approach, simulated and computed values of Trout for trin ≤ trb (Cl ) match with a tolerable average error of 4.5% for several inverter sizes(Wn , 2Wn ...). The model and its coefficients variation with load and device size for delay and output transition time are developed at 45nm technology. In the next section we would like to prove delay model form remains same with the change in technology node. VI. I MPACT OF TECHNOLOGY SCALING ON THE DELAY
Ids dt
Wn VDD 2 t L TR 0.8V DD
(11)
MODEL
(12)
In this section we would like to verify that the delay model developed in the previous section remains valid with
1
8 Cl=8f
7.5
1 Cl=8f Model
7
0.4
6
0.6
5.5
K2
0.6
cl=4f cl=2f cl=8f cl=16f
0.8
6.5
Coefficient b
Coefficient a
0.8
5 4.5
0.2
0.4
4 0 50
100
150 200 Wn(nm)
250
3.5 0.06
300
0.07
0.08
0.09
0.1 0.5
1/(Wn
(a)
0.11
0.12
0.2
0.13
)
(b)
0 40
60
80
200
150 Coefficient c
100
120 Wn(nm)
140
160
180
200
(a)
Cl=8f Model
160 cl=2f cl=4f cl=8f cl=16f
140 100
120 100
0 0.002 0.004 0.006 0.008 0.01 1/Wn
K1Cl+K3
50
80
0.012 0.014 0.016
60 40
(c)
20 0 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02 0.022 1/Wn(nm-1)
Fig. 9. Variation of Trout model coefficients in case2 with the device width (Wn ) which verifies observation 10, 11 and 12
(b)
350 Wp/Wn=2/1 Eqn.8 Wp/Wn=4/2 Eqn.8 Wp/Wn=6/3 Eqn.8 Wp/Wn=8/4 Eqn.8
300
200
150
100
50
0 2
4
6
8
10
12
14
16
Cl(ff)
(c) 22
300
700 Cl=2f Eqn.7 Cl=4f Eqn.7 Cl=8f Eqn.7 Cl=12f Eqn.7 Cl=16f Eqn.7
500
150
20 18 16
400
14 STrb
200
Slope
trin=35ps Eqn.7 trin=150ps Eqn.7 trin=500ps Eqn.7
600
Delay(ps)
250
Delay(ps)
250
trb(ps)
scaling of technology node. In this section the simulated data is obtained at 32nm technology node. Fig 10(a) & 10(b) shows the variation of delay with the transition time at a given load capacitance, load capacitance at a given transition time respectively. The points are the simulated data and the discontinuous lines are the curve fitting of the delay model developed in section 7. We also verified the variation of
300
12
200
10
100
8
100
6
50
0
4
-100
0 0
50
100
150
200
250 300 trin(ps)
350
400
450
(a) Delay variation with trin
500
0
10
20
30 Cl(ff)
40
50
60
(b) Delay variation with Cl
Fig. 10. Points are simulated data and discontinuous lines are fitting of Eqn. 7
coefficients in the delay model and the breakpoints with the load and also the device width(Wn ) as predicted in the sections II & III. Hence, the novel linear delay model is valid with the technology scaling, which is the first of its kind when compared to the various delay delay models that have been developed in the past. In the next section we make use of the delay model to develop the LUTs and compare with that of the traditional LUTs. VII. A N OPTIMIZED DELAY L OOK -U P TABLE (LUT) In this section we use the results of Sections IV-V to replace simulated values of delays and output transition times for trin ≤ trb with their respective computed values and thus save simulation effort in standard cell characterization. For values of trin > trb , we use HSPICE simulations to obtain delays. For inverters of other sizes, we extrapolate the delay
2 0.004
0.006
0.008
0.01
0.012 0.014 1/Wn
0.016
0.018
0.02
0.022
(d)
Fig. 11. (a)K2 in Eqn. 7 with Wn (b)K1Cl + K3 in Eqn. 7 with Wn (c) trb variation with load capacitance for different sizes (d)Strb variation with Wn
values for trin ≤ trb using the relationship between delay model coefficients and inverter size given by observations. We obtain the LUT for Trout values using a similar method. We verify our method on a sample standard cell library having inverters of sizes: minimum NMOS device width Wmin , 2Wmin , 3Wmin , 4Wmin and 5Wmin containing Wp /Wn ratio as in a standard cell. We generate two sets of LUTs of delay and Trout , one by entirely using HSPICE simulations and the other using our method. We assumed that the largest value of transition time for the library is 500ps and select a largest value of Cl for each inverter which satisfies this condition. We build arbitrary size inverter chains using our library and simulate them using HSPICE. We compare the inverter chain delays obtained using both, the traditional simulated generated LUTs
and our modified LUTs in fig 12(b). The LUTs generated using our method compare well with entirely simulation generated LUTs even while reducing the required number of simulations by a maximum of 60% for cell library. characterization. Figure 12(a) shows the % saving of SPICE simulations for different LUT size. 0.9 6x6 Matrix 9x9 Matrix
Relative Delay
0.8
Savings
0.7 0.6 0.5 0.4 0.3
1 0.98 0.96 0.94 0.92 0.9 0.88 0.86 0.84 0.82 0.8
0.2 0
50
100 150 200 250 Size of NMOS(Wn)
300
(a)
Model LUT Traditional LUT
3 4 5 6 7 8 9 10 11 No of Stages
(b)
Fig. 12. (a) savings in HSPICE simulations in comparison with traditional method of characterization (b) Relative Error in measurement of delay of various buffers using Model LUTs and HSPICE
VIII. C ONCLUSION AND F UTURE W ORK We show that if an upper bound on input transition time trin is followed, a simple delay model is valid for inverters which relates delay linearly to trin and load capacitance Cl . We also derive the relation of the delay model coefficients with inverter size Wn (assuming that the ratio of its NMOS and PMOS devices remains constant)and simple relations which express trb as a function of Cl and Wn . We derive similar relations which relate output transition time Trout to trin , Cl and Wn . To derive these relations we did not use device currents/capacitances models. We use the topology of the gate and the charging/discharging phenomenon of the load stage. Therefore, these relations are general in nature and would not change with technology scaling. Using these relations, we show that standard cell library characterization can be done with a significantly lesser number of simulations (60% reduction) while maintaining accuracy. This is useful since numerous cycles of standard cell characterization would be needed at several Process, Voltage and Temperature (PVT) corners in deep sub-micron technologies. Another potential application of this work is in increasing the accuracy of standard cell characterization data in the form of LUT. This is because the LUT points in the region of validity of the linear delay models do not need simulations. Therefore, to increase accuracy of the LUT, simulations can be performed to obtain delay for additional points where delay is a highly nonlinear function of (trin ,Cl ). As a future work, we will extend the relations we obtained for the inverter to other logic gates, multi-stage standard cells and sequential circuit elements. R EFERENCES [1] Louis Scheffer, EDA for IC implementation, circuit design, and process technology, Addision-Wesley, Reading. [2] http://www.opensourceliberty.com [3] Ivan Sutherlands,Bob Sproull and David Harris, “Logical Effort:Designing Fast CMOS Circuits”, Morgan Kaufmann Publications, 1999.
[4] T. Sakurai and R. Newton, “Alpha-power law MOSFET model and its implications to CMOS inverter delay and other formulas,” IEEE JSSC, pp. 584-594, April, 1990. [5] JianChang, Louis G Johnson and Cheng Liu, “Piecewise Linear Delay Modeling of CMOS VLSI Circuits,” IEEE IMSCAS, August, 2009. [6] Yangang Wang and Mark Zwolensky, “Analytical Transient Response and Propoagtion Delay Model for Nanoscale CMOS Inverter,” IEEE ISCAS, Nov, 2009. [7] N. Hedenstierna and K.O. Jeppson, “CMOS Circuit Speed and Buffer Optimization,” IEEE Tran. on Computer-Aided Design, pp. 270-281, March, 1987.