robust device modeling with process variation

1 downloads 0 Views 1MB Size Report
2.6.2 Process variation characterization in CBFP gate model . . . . 53. 2.7 Experimental ...... our new model and BSIM in Verilog-A to ensure a similar simulation environment. ...... Taking fj ⊗ ppt we will obtain the matrix in the space R(j−1)Kn×n.
ROBUST DEVICE MODELING WITH PROCESS VARIATION CONSIDERATION AND DIMENSION REDUCTION TECHNIQUES by Alexander Venelinov Mitev

c Alexander Venelinov Mitev2009 Copyright

A Dissertation Submitted to the Faculty of the ELECTRICAL ENGINEERING DEPARTMENT In Partial Fulfillment of the Requirements For the Degree of DOCTOR OF PHILOSOPHY In the Graduate College THE UNIVERSITY OF ARIZONA

2008

2 THE UNIVERSITY OF ARIZONA GRADUATE COLLEGE

As members of the Final Examination Committee, we certify that we have read the dissertation prepared by Alexander Venelinov Mitev entitled Robust Device Modeling with Process Variation Consideration and Dimension Reduction Techniques and recommend that it be accepted as fulfilling the dissertation requirement for the Degree of Doctor of Philosophy. Date: 12 December 2008 Janet Wang

Date: 12 December 2008 Michael Marefat

Date: 12 December 2008 Susan Lysecky

Date: 12 December 2008 Leonardo Lopes

Final approval and acceptance of this dissertation is contingent upon the candidate’s submission of the final copies of the dissertation to the Graduate College. I hereby certify that I have read this dissertation prepared under my direction and recommend that it be accepted as fulfilling the dissertation requirement. Date: 12 December 2008 Dissertation Co-chair: Janet Wang

Date: 12 December 2008 Dissertation Co-chair: Michael Marefat

3

STATEMENT BY AUTHOR

This dissertation has been submitted in partial fulfillment of requirements for an advanced degree at the University of Arizona and is deposited in the University Library to be made available to borrowers under rules of the Library. Brief quotations from this dissertation are allowable without special permission, provided that accurate acknowledgment of source is made. Requests for permission for extended quotation from or reproduction of this manuscript in whole or in part may be granted by the copyright holder.

SIGNED: Alexander Venelinov Mitev

4 ACKNOWLEDGEMENTS

First of all I would like to thank my advisor Dr. Janet Wang, for her invaluable guidance and efforts to support the research presented in this dissertation. I could only admire her original ideas, her unique talent to bring seemingly unattainable goals closer to achievement. I feel indebted to her, particulary for spending so much of her time for discussions, sharing various ideas and thoughts. I would also like to thank to Dr. Kevin Cao from ASU, for being member of my comprehensive exam committee. Additionally I am much grateful for the opportunity to work with his research team to extend my research interests and contribute to the device modeling project. I owe special thanks to Dr. Michael Marefat, for his support, his wise guidance and for drawing my attention to the real values of my research. I wish to thank to my friends Raj, Mahesh and Ajai in recognition of their special support for my studies. Finally I must express my deep gratitude to my fiancee Anilda, whose support and encouragement gave me the necessary strength to finish this dissertation.

5

DEDICATION

I dedicate this work to my family

6

TABLE OF CONTENTS

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 CHAPTER 1 INTRODUCTION . . . . . . . 1.1 Circuit simulations, BSIM models . . . 1.2 Circuit performance metrics . . . . . . 1.3 Performance related macromodels . . . 1.4 Research goals and dissertation outline

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

13 15 18 22 24

CHAPTER 2 ROBUST GATE MODELING . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Finite Point based CSM model - Static Parameterizations . . . 2.2.1 The choice of finite points . . . . . . . . . . . . . . . . 2.2.2 Finite point extraction . . . . . . . . . . . . . . . . . . 2.2.3 Static current characterization for arbitrary gate . . . . 2.3 Finite Point based CSM model - Dynamic Parameterizations . 2.3.1 Capacitive model . . . . . . . . . . . . . . . . . . . . . 2.3.2 Charge model . . . . . . . . . . . . . . . . . . . . . . . 2.4 Short circuit current characterization . . . . . . . . . . . . . . 2.5 Leakage characterization for the CBFP gate model . . . . . . 2.6 Process Variations for CBFP model . . . . . . . . . . . . . . . 2.6.1 Waveform extraction with SVD . . . . . . . . . . . . . 2.6.2 Process variation characterization in CBFP gate model 2.7 Experimental Results . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

26 26 30 30 32 35 38 38 40 46 46 48 49 53 57

CHAPTER 3 DIMENSION REDUCTION TECHNIQUES . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Motivation and preliminaries . . . . . . . . . . . . . . . . . . . . . . . 3.3 Sliced Inverse Regression (SIR) based Dimension Reduction . . . . . 3.3.1 Basics of SIR . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Static Statistical Timing Analysis (SSTA) with Sliced Inverse Regression based Parameter Reduction . . . . . . . . . . . . . 3.3.3 Sliced Inverse Regression based Parameter Reduction Algorithm

64 64 66 74 74 77 78

7 TABLE OF CONTENTS – Continued

3.4

3.5

3.3.4 Special Cases . . . . . . . . . . . . . . . . . . . . . . . . . . Performance Oriented Dimension Reduction Techniques By Principle Hessian Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 RC circuit with process variations . . . . . . . . . . . . . . . 3.4.2 Parameter Reduction with PHD . . . . . . . . . . . . . . . . 3.4.3 PHD Application in RC Interconnect Networks . . . . . . . 3.4.4 Generalized PHD and its algorithm . . . . . . . . . . . . . . Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Parameter reduction with sampled data . . . . . . . . . . . . 3.5.2 Parameter reduction with known performance function . . .

. 80 . . . . . . . .

81 81 83 86 88 92 92 94

CHAPTER 4 CONCLUDING REMARKS . . . . . . . . . . . . . . . . . . . 105 APPENDIX A STEIN LEMMA . . . . . . . . . . . . . . . . . . . . . . . . . 107 A.0.3 First order derivative estimation . . . . . . . . . . . . . . . . . 107 A.0.4 Second order derivative estimation . . . . . . . . . . . . . . . 108 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

8

LIST OF FIGURES

1.1 1.2 1.3

Circuit analysis: a-Linear; b-Nonlinear; c-Nonlinear equivalent circuit 15 I-V family of curves with Finite Points . . . . . . . . . . . . . . . . . 18 Corner point method: a-Performance space; b-Design space . . . . . . 20

2.1 2.2 2.3

I-V family of curves with Finite Points . . . . . . . . . . . . . . . . Finite Point A related to VGS . . . . . . . . . . . . . . . . . . . . . The choice of point B: a-Sah equation, b-same slope, c and d-Heuristic method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Single Input Switching: a-PDN VI switches ; b-equivalent network . Static I-V curves for PDN for inverter (NMOS) and NAND gate . . Point A (PDN) for different gates . . . . . . . . . . . . . . . . . . . Switch model of dynamic behavior of static CMOS inverter . . . . . Capacitance Finite Current Source Model . . . . . . . . . . . . . . Dynamic parameterizations . . . . . . . . . . . . . . . . . . . . . . Family of curves Cm − VI for different VO . . . . . . . . . . . . . . Finite Point Charge Model . . . . . . . . . . . . . . . . . . . . . . . Finite Point Charge Model - Dynamic characterization . . . . . . . Charge plots Q(VI , VO ) . . . . . . . . . . . . . . . . . . . . . . . . . Short circuit current . . . . . . . . . . . . . . . . . . . . . . . . . . Leakage current components . . . . . . . . . . . . . . . . . . . . . . Logarithmic plot of subthreshold current . . . . . . . . . . . . . . . Determining the finite points from the I-V curves of leakage . . . . Waveform generation in a gate chain . . . . . . . . . . . . . . . . . Input Waveform Family . . . . . . . . . . . . . . . . . . . . . . . . The first three input principle components. Note that the 1st PC resemble the input waveform . . . . . . . . . . . . . . . . . . . . . . The first four output principle components . . . . . . . . . . . . . . Point B deviation due to variation at LEF F and VT H . . . . . . . . . FP relation to VT H . . . . . . . . . . . . . . . . . . . . . . . . . . . FP relation to LEF F . . . . . . . . . . . . . . . . . . . . . . . . . . Transient analysis comparison:BSIM vs. CBFP . . . . . . . . . . . Short-circuit current comparison:BSIM vs. CBFP . . . . . . . . . . Waveform generation in a chain of gates . . . . . . . . . . . . . . . PDF of the leakage current for Inverter (sigle nMOS transistor) . . PDF of the output delay comparison for Inverter and NAND2 gate

2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 2.18 2.19 2.20 2.21 2.22 2.23 2.24 2.25 2.26 2.27 2.28 2.29

. 30 . 31 . . . . . . . . . . . . . . . . .

33 35 37 37 38 39 41 41 42 43 44 45 47 48 48 49 51

. . . . . . . . . .

52 53 56 57 58 59 59 60 60 61

9 LIST OF FIGURES – Continued 2.30 PDF plot) 2.31 PDF plot) 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 3.15 3.16 3.17

of the output delay comparison for (3-left plot) and (4-right chain of Inverters . . . . . . . . . . . . . . . . . . . . . . . . . . 63 of the output delay comparison for (3-left plot) and (4-right chain of NAND2 . . . . . . . . . . . . . . . . . . . . . . . . . . 63

A Circuit with Correlated Parameters . . . . . . . . . . . . . . . . . . Gate-interconnect example. Output delay is related to set of process variation parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . Delay correlated to: NMOS threshold voltage Vthn, PMOS threshold voltage Vthp, PMOS transistor length Lp and π model C2 (C1 ) . . . . Two statistically dominant directions with three dimension vectors . . (a-General model, where g = g(p) is the performance function; bReduced model where m = m(z(p)) is the new reduced function . . . A RC interconnect network example . . . . . . . . . . . . . . . . . . Circuit C17 from ISCAS benchmark circuits . . . . . . . . . . . . . . Statistical Timing Analysis (SSTA) flow with Principle Hessian Direction (PHD) based reduction . . . . . . . . . . . . . . . . . . . . . Circuit supporting SSTA example . . . . . . . . . . . . . . . . . . . . Delay PDF distribution comparison at the output of D4 and D5 . . . Example of skewed distributions for some parameters . . . . . . . . . Delay φ accuracy comparison with different number of new parameters after reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . PDF of the Delay at the output of RC interconnect line (top) and mesh (bottom) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The norm of the new directions Z1 − Z3 . . . . . . . . . . . . . . . . Frequency domain phase and amplitude accuracy comparison for C17 Time domain accuracy comparison for C17 . . . . . . . . . . . . . . . The Dominant Directions for a 2-input and 2-output Interconnect Network: the main directions for Path A (Up) , Path B (Middle) and combined (Bottom) . . . . . . . . . . . . . . . . . . . . . . . . . . . .

68 68 70 71 73 82 88 91 92 93 94 95 97 99 100 100

104

10

LIST OF TABLES

2.1 2.2 2.3 2.4 2.5 2.6 2.7

Input patterns for NAND2 gate . . . IA - Piecewise linear approximation . Expansion coefficients for IDC FP . . Expansion coefficients for Charge FP IA - Piecewise linear approximation . Delay mean values µ . . . . . . . . . Delay deviation values σ . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

3.1 3.2 3.3 3.4

Comparison between PHD and PCA on Benchmark Circuits Entries of β1 p for Reduction considering paths A,B and both Entries of β2 p for Reduction considering paths A,B and both Entries of β3 p for Reduction considering paths A,B and both

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

36 36 55 55 57 62 62

. . . . . A and B A and B A and B

98 101 102 103

11 ABSTRACT

Nowadays the highest device integration affects the design process in several ways. The process variations (PV) significantly impact the circuit performance. As a consequence, a major consideration is determining the production yield relation to the technology based manufacturing variations. The traditional Monte Carlo based sampling analysis became computationally not practical solution due to the extensive parameter set and computationally demanding transistor models. Hence the overall simulation time increases rapidly. Additionally the higher device integration requires dealing with numerous local and global parameters. Clearly all these factors can bottleneck the efforts of achieving fast design cycle, resulting in dramatic computational cost increase along with the decrease of the transistor size. Statistical analysis can be facilitated by estimating a direct relation of circuit performance factors to the PV parameters. The compact transistor models such as BSIM or PSP use a large number of parameters and equations. However various performance factors can be related to few circuit parameters. For complementary gates we propose new macro model, where all static and dynamic characteristics are related to set of Finite Points. All timing and power related quantities can be predicted by evaluating the model equations. The dynamic characterization relies on charge distribution at each node. The affect of all PV is represented with parameterizing the FP sensitivity to the all variational sources. In overall the new gate model employ same computational structure for different gates and related to traditional BSIM models is in far more simple computational form. This results in more efficient Monte Carlo analysis. Large scale circuit analysis based on the FP models can be used for estimation of various global performance parameters. These circuit measures such as propagation delay for example is evaluated gradually from the circuit input to desired output.

12 Therefore at each computational step the objective functions such as intermediate delays are related to larger set of parameters including the process variations. Motivated by the limitations the traditional PCA, in order to simplify the overall computational cost, an efficient reduction technique is proposed. Additional information for pruning new dimensions is obtained from measuring the input output correlation where any local performance metrics consist parameters as input and performance measure as output. If this relation is unknown, Sliced Inverse Regression (SIR) technique can be used to determine the Effective Reduction Space (EDR). However if the empiric performance analytic expression is established, the EDR is found by using Principle Hessian Method (PHD). Here the Hessian matrix of the performance function spans additionally the parameter space to the initial reduction ruled by PCA. In theoretical aspect the inverse technique reduces parameters in the sense of their statistical significance.

13 CHAPTER 1 INTRODUCTION

In general there are many factors motivating the technological advancement in the industry. Considering the electronic production we notice the utmost sensitivity of overall sales and profit to the quality, reliability and manufacturing yield of the electronic components. This fact is valid for any industry and makes no exception for the integrated circuit manufacture. The unique history and evolution of the electronic industry, is due to factors unseen in any other industrial field. Predicted by Moore’s law the technological trend of device scaling, along with the raise of circuit complexity determined the constant demand for high speed, low power and cheap integrated devices for the market. As a result the integrated circuits reach the market at cutting edge of the technology and no other improvements can be achieved. This set new consumer standards, overtaken in short period of time. The booming electronic industry can be vied in the light of struggling compromises made at the stage of circuit design, where the technological limits are opposed to market demands and the desired profit. It is fair to say that such of trade off can turn down even the most sophisticated engineering design due to unfeasible economic factors. As we do further this analysis, easily can be distinguished several circuit design parameters of integrated circuits directly linked to the market gain. Keeping in mind the ultimate goal of designing fast, energy saving, sophisticated and cheap ICs, for digital IC main concern is naturally timing and consumed power. While the design glitches can be detected at the verification stage, so all necessary design correction can be made prior final prototype, the impact of device parameter variability is subject of different and complex analysis. Finally the semiconductor fabrication yield encompasses the affect of all undesired factors to the IC performance. Since in the beginning of electronic industry faulty circuit was cased mainly due to defects in production stage, the design process adopted certain technique

14 to ensure abilities for testing the product design. Known as Design for Testability (DFT) these techniques were standardized in common design practice. As the production volume increased and the circuit design became more complex a new design challenge emerged - how to combine all design requirements and economic goals in profitable product. Inspired by the ultimate goal is minimizing the overall cost and the number of faulty chips, this complex requirement underlay new methodology standards known as Design for Manufacturability (DFM). The DFM techniques are developed to overcome problems such as functional yield loss caused by technological and equipment defects during manufacture, the process variation yield loss mainly in submicron technology scale and preventing other negative factors. In most recent years, the significant impact of the process variations to the various circuit performance factors led to development of statistical models considering the relation of yield and defect densities. This tendency emerged in new concept called Design for Yield (DFY), where all design efforts are in the direction of predicting and minimizing the process variations affect to faulty integrated circuits. In this dissertation all discussions are focused in some technical aspects contributing the DFY/DFM, looked through basic applications of circuit performance estimate. Such an estimate measures for example is output delay resulting out of application of timing analysis. We can mention as an additional example the total consumed power, result of power related analysis. Needless to say these estimates are directly related to the yield and efficient estimate of the circuit performance is step ahead in support the DFY/DFM effort. In this work two major directions are considered as beneficial to the discussed matter. In the first part of the dissertation an efficient gate macromodel is presented. It can greatly accelerate the yield modeling by simplifying various relations between variable sources and performance function in the scope of our interest. The yield estimation essentially relies on Monte Carlo analysis and the number of the samples is closely correlates to the desired statistical accuracy. However the large system integration imposes the heavy burden of handling numerous process variation parameters. Due to this fact known the ”curse of dimensionality” , Monte Carlo analysis became impractical in general statistical

15

VI

II

V2 R2 I2 R3

R1

II

VI

V3

V2 I2

R1

I3

I3

b

a

VI

V3

II

V2

I3 Ri

R1

I2

Gi

V3 Id

Ci

c

Figure 1.1: Circuit analysis: a-Linear; b-Nonlinear; c-Nonlinear equivalent circuit application. An efficient dimension reduction method leading to the representation of the circuit performance estimators with reduced set of variational parameters can tremendously simplify the results of statistical analysis. This reduction techniques are subject of discussion in the second part of this work. 1.1 Circuit simulations, BSIM models Electronic circuit simulations refers to any activity, where the actual circuit physical behavior is replicated by utilizing mathematical models. One of the first simulator is Berkeley SPICE (Simulation Program with Integrated Circuit Emphasis) and it dates back in the 60s. SPICE has become industrial standard computer program for electrical simulation, with hundreds of thousands copies worldwide. The purpose of this chapter is to provide basics of circuit simulations with SPICE, in more general and conceptual level. These methods are closely related to the SPICE philosophy of utilizing hybrid physical empiric nonlinear transistor models and pretty much are the most widespread approach of modeling.

The circuit is represented in netlist-

file and is represent as a set of linear and nonlinear devices. Generally there are three

16 levels of complexity [45] - linear DC circuit analysis, nonlinear DC circuit analysis and AC analysis. As a first example is the case of a circuit consisting of fixed current sources and linear resistors (Fig. 1.1-a)and this can be done by Kirchhoff and Ohm laws. SPICE uses modified nodal analysis in this particular case, but mesh analysis is applicable as well. Going further to the next level, we would like to analyze similar circuits but now with nonlinear elements such as diodes or transistors. Figure 1.1-b shows such a case with sample circuit. The diode, regarded as two-port element is substituted with appropriate model on figure 1.1-c. Note that the model equations are described with nonlinear functional Ci, Ri, Gi, Id ≈ F (V3 ). However due to nonlinear equation Id ∼ eV3 , it is no longer possible to find analytic solution for the system I3 (V3 ) = 0. The solution is provided by iterative Newton-Raphson method. The method does not always converge, and this is related to additional computational efforts. The most important circuit features are revealed observing nodal voltages and current flow in time domain representation. The simulator transient analysis is performed by solving the differential equation due to the reactive elements such as 3 and capacitors and inductors. For example the capacitor current is Ic = Ci dV dt

in SPICE this differential equation is solved with numeric method. Assume small constant dt ∼ ∆t we can write: • Forward Euler method: Ic(t) = Ci

V3 (t + ∆t) − V3 (t) ∆t (1.1)

• Backward Euler method: Ic(t) = Ci

V3 (t) − V3 (t − ∆t) ∆t

(1.2)

SPICE takes and advantage of combining both equation in 1.1 and 1.2 referred as: • Trapezoidal rule: Ic(t + ∆t) = 2Ci

V3 (t + ∆t) V3 (t) − 2Ci − Ic(t) ∆t ∆t

(1.3)

17 Therefore, the transient analysis solves the system differential equations stepwise in linearized form. In addition since we have nonlinear Ci and Id, Newton-Raphson method can be employed to reach solution. Obviously the choice of time step will correlate with the simulation accuracy. Additional methods can be utilized such as Gear integration, Volterra series expansion, needed for distortion analysis etc. The tendency of constant scaling of the technology parameters requires developing of suitable models for each device size. As we mentioned the diode was substituted with equivalent circuit consisting four elements - two resistances, current source and capacitor. This topology is based on the semiconductor physics and processes occurred in PN junction. For example the resistance Ri models the semiconductormetal resistance, the current source Id depict the well known exponential equations, Ci is the junction capacitance and Gi the equivalent junction resistance, both variable with the junction voltage. The physical sense of these models are found in the equation of Id , where the empiric sense is for example the expression of Ci = Ci(V3 ), which can be found purely by curve fitting. For CMOS, Bipolar and related devices the equivalent model has more complicated topology, but the device characterization is performed in similar fashion. For sub-micron devices many additional phenomena occur and this for example is the driving motivation of model improvement. This process consist not only updated parametric values, but also extended model topologies. As an example BSIM4 CMOS model for 65nm technology size is build upon of hundreds equations and consist more than two hundred parameters [8]. Summarizing these observations by following: • The traditional circuit simulators are closely based on mathematical models and replicate the exact physical processes. This fact imposes the heavy burden of utilizing iterative algorithms and potentially limit the computational productivity. • The most recent transistor models such is BSIM v4 impose the tradeoff of accurate modeling with the price utilizing complicated equivalent circuit topology and model equations. Hence this additionally aggravates the perspective of

18

R C/2

C/2

Figure 1.2: I-V family of curves with Finite Points running large transistor circuits for less time. 1.2 Circuit performance metrics As it was discussed before, several circuit parameters are crucial to the overall circuit fitness. For example if the timing requirement are not met, even at the output of a single gate, this can jeopardize the entire design. In addition, ”hot chips’ exceeding just with several watts the prescribed consumed power, again cannot comply with the assembly requirement etc. Obviously the circuit design flow is limited to maintain various specifications, and computational tools especially related to timing and consumed power are highly appreciated. In this dissertation we regard results from timing analysis as a typical circuit performance function. In fact compare to some other circuit fitness indexes, we found timing measures to be at most suitable for studying the effect of process variations. The main reason is that timing quantities usually is related to the delay measures of various gates and interconnects, essentially described with broad set of global and local parameters. The digital circuits are build upon numbers of transistors which are organized into gates and confected by interconnect wires. The transistors modeling was discussed in previous section. The interconnects are represented by resistors capacitor and inductors. For the needs of timing analysis, the transient simulation of equivalent circuit representation by nonlinear and linear elements over a specified period of time requires numerous time steps iteratively running nonlinear solver routine at each step. In some applications where the accuracy can be sacrificed such a timedomain analysis method is far too elaborate and time-consuming and fast timing

19 analysis techniques emerged in the design practice. The following is a brief overview of the techniques used in timing analysis. The on-chip wire can be modeled as pair of resistor-capacitor corresponding to infinitesimal length dx. In practice a discreet segments can be used as in [43] is discussed segmentation of 1000µ wire into 10 π-models as the one shown on figure 1.2. Further the interconnect branching is studied and modeled Elmore delay metrics. However this method characterize the delay with first order expression in P P the form T = T ( Ri Ci). More accurate is the Asymptotic Waveform Evaluation (AWE) method [39], where it explore the transfer function and by using moment matching reduce the system order (Pade approximant). This clearly relate the interconnect delay for example to fewer process variation parameters, seen as geometric properties at each π-segment. AWE impose some issues associated with the quality and numerical stability of the solution. Extensions for further improvement are developed, such as Krilov subspace method and the block Arnoldy method. Presently widely used is the Passive Reduced-order Interconnect Macromodeling Algorithm (PRIMA) [35]. Considering combinational gate timing, the common delay metrics includes different approaches. Lookup tables can relate the delay to different capacitive loads and input arrival times. Cell delay modeling, where the timing is expressed as a equation in the form T = k1 CL + k2 and further expanded to k-factor model including the influence of the input arrival time. nonlinear polynomial models are proposed from Synopsys. The effective capacitance model is common technique for output delay calculation, especially when the driver resistance is comparable to the wire resistance in the lumped capacitance model. As a first step OBrien-Savarino method transform any RC load to equivalent RC-π model. The capacitance coupling effect occurred among spatially close wires cause crosstalk noise or significant delay changes. Calculating the corresponding Miller capacitance is the simplest first order approximation suitable for fast timing analysis. Large combinational circuits consist numerous gates and interconnect and can

20

p2

Performance space

d2

Design space

p2max

p2min p1min

p1max

a

d1

p1 b

Figure 1.3: Corner point method: a-Performance space; b-Design space be represented as a a timing graph G = (V, E), where V are the gate outputs and E are the paths from the output of some gate to the input of the next one. This is refereed as directed acyclic graph (DAG). The static timing analysis (STA) refer to finding the worst case delays through the circuit, independently to the input arrivals. Algorithmic solutions are based on investigating circuit DAG. However in sub micron technology scale, the impact of manufacturing variability is significant. This lead to extension of STA to statistical static timing analysis, where the timing is not a fixed number but a probability density functions (PDFs) taking the statistical distribution of parametric variations into consideration. Generally the inter-die variations are well studied, and tend to cause less impact on timing metrics. A common technique is to simulate the design in different corner points where the worst case of values for the design parameters (LEF F , TOX , WEF F ), depicting the range of the performance metrics (Delay, Power dissipation, etc). But in nanoscale technological era this approach became impractical due to the increasing number of variational sources and difficulties to capture the correspondence design-performance relation appropriately (1.3). On the other hand, the intra-die (spatial) variations can cause dramatic delay deviation [11]. The need to include this negative impact emerged in denoting some variables as local such as variation at LEF F , WEF F (Across Chip Linewidth Variation ACLV), due to the lithography

21 flaws and Inter Layer Variations (ILV) due to chemical mechanical polishing. The intra-chip variations are commonly represented with three terms: deterministic local, deterministic global and random part δ = δlocal +δglobal +. The global variations is location-dependent and can be related to absolute coordinates on the die. The local variations are proximity dependent and layout specific. The random part  is usually modeled with Gaussian distribution. Finally any design parameter of interest pi is related to the parameter variation as pi = p¯+δglobal (x, y)+δlocal +N (0, σ). The local variability however requires precise spatial parameterizations, where grid model or quad-tree partitioning [2]. The statistical static timing analysis essentially explore the circuit timing graph and assume delay impact at each node (gate) caused by gate variations. Hence the output delay statistics is resultant to the input and gate statistical properties. Technically the early work in SSTA simply propagates the probability density functions of the delay through each node as convoluting gate and input arrival PDF’s [24]. In addition one may perform analysis based on traversing the entire path to find the delay distribution of the circuit. However the main limitation of path-based method is due to the large path number to be considered. This is partially overcomes in block-based method, where the propagation of the delay is topological based and it process each node once. It is considered to much faster in the design practice. Most of the approaches to SSTA are based on two fundamental operations: max and sum operation. This can be described with following: dY = max(dA + dA→Y , dB + dB→Y )

(1.4)

, where dY is gate output delay, dA and dB is the arrival at input A and B and dA→Y , dB→Y are propagation delay. Since these delays are Gaussian distributed variables the sum operation results in gaussian, but max operation not. Approximation techniques are employed to resolve this issue. An important point is that the presumably normal distributed variations can result in non Gaussian distributed performance metrics. This partially explains the nonlinear parameter-performance mapping on figure 1.3.

22 We shall emphasis the most common facts related to large scale circuit performance analysis. Having in mind the timing as a manifest example in that mater, we regard any circuit fitness measure to be related nonlinearly to large parametric set. The problems induced by this assumption affects greatly the statistical circuit analysis, where both deterministic and statistical methods must be employed to ensure the accuracy and computational efficacy in performance metrics estimation. 1.3 Performance related macromodels To overcome the high demanding computational issues related with Monte Carlo analysis and standard SPICE simulations, new fast macro models were developed. These models address different aspects of circuit performance, where timing and power dissipation are the most crucial for the design of digital ICs. There are two sources of power dissipation: dynamic and static. The dynamic power is related to the current related to charge and discharge of gate capacitances during switching. This portion is estimated by [38]: P = CVDD ∆V f α

(1.5)

, where C is total capacitance, VDD is the power supply, ∆V is the voltage swing, f is the clock frequency and 0 < α < 1 is the expected number of transitions. Dynamic power dissipation for CMOS circuits also includes the short circuit flowing directly from the supply voltage to the ground. This essential element can be parameterized successfully by studying the PUN and PDN static current component ISHCK = min(IP DN , IP U N ). Finally the static power dissipation related to the steady gate state is usually related to leakage such as drain-substrate and gate-oxide current. Now in order to achieve successful power consumption characterization, we must consider each power related component. In other words the gate switching capacitance, short-circuit and leakage current are the potential candidates for estimation and precise modeling. Not that since this estimation is carried out for a single gate, we impose harsh accuracy requirement due to the large number of gates. For the

23 sake of completeness we shall add that the power modeling studied in a big scale, such as complete system or in microarchitectural level, requires estimation of α in equation 1.5. The analysis essentially use technique known as Stochastic Automata Networks, where system power-performance model is build by upon Markov process where the results are much more realistic than the worst case scenario [38]. The fast timing models are common in designer practice. As it was discussed in previous section there is large variety of methods, however beyond the the goals of this introduction. An essential technique employed in the experimental parts, is Probabilistic Collocation Method (PCM) [25]. The PCM is response surface method, exploring the relationship between output function (gate delay) and several parameters such as input arrival time and process variation parameters. Note the physical inconsistence of the explanatory variable (timing ,layout geometry) which place the method in the class of empiric technique. Despite this observation the achieved accuracy is sufficient for practical application. The essence of the PCM is to extrapolate the delay with set of orthogonal polynomial: Tˆd = A0 + A1 H1 (p1 ) + A2 H2 (p2 )

(1.6)

, where the polynomials Hi (pk ) are determined from the orthogonality property: Z fp Hi (p)Hj (p)dp = 0, i 6= j (1.7) p

The model is evaluated, where each variational parameter in Hk (pi ) is substituted with collocation points,chosen from the zeros of Hk+1 (pi ) polynomial. In [51] is proposed appropriate heuristic for constructing the evaluation samples. Note that if the accuracy is not sufficient in 1.6, the model is evaluated with higher polynomial set. The probability fiction in 1.7, rules different polynomial, as for Gaussian variables we use Hermit polynomials. There are techniques of constructing polynomials for variable with arbitrary PDF’s, so the condition 1.7 hold always. The described method serves well for calculating the timing performance for a single gate and has been utilized as a supplementary method in this dissertation.

24 1.4 Research goals and dissertation outline The numerical estimation of the process variations impact to the production yield imposes various computational difficulties. First it should be noted that various circuit performances, related to different parameters may take place in the design practice. In many application we are interested in individual parametric models for each performance function (power related and timing). Secondly it has been shown that efficient compact models can compete with BSIM models in speed and accuracy. In this work the first objective is to provide the basis of a new robust device macromodel, aiming to approximate closely the most important design metrics such as timing and power dissipation. The unique goal here is to provide new modeling approach, handling at the same time exact gate output waveform approximation, along with simplified power dissipation calculation. In addition to these requirements the ultimate task of process variations characterization in the light of compact gate modeling determines the final research direction. The modeling it self is based on approximating the charge and current flow between the gate nodes. The model equation are related to set of specific points, hence the name of the model Finite Point (FP) gate model. This contribution made possible efficiently to be captured the process variation impact and various gate performance functions to be parameterized. The Finite Point gate models are discussed in Chapter 2 of this work. The IC manufacture yield generally can be related to global circuit performance metrics, affected by the impact of various local and global performance measures. For example circuit timing is related to gate timing and as long as we consider intra-die variations, we can relate the local delay calculation to the variational local parameters. However at the big scale timing analysis the delay calculation of IC troublesome, due to the increasing parameter space. This claims is valid for different performance metrics, but timing will be discussed extensively in this work. This lead us to the secondary objective of this dissertation, which is developing an efficient method of parameter order reduction in applications closely related to per-

25 formance estimation. The proposed method consider the variability of the parameter set (mainly due to process variations) and prunes parameters with less statistical significance. The problem of dimension reduction are discussed in chapter 3.

26 CHAPTER 2 ROBUST GATE MODELING

2.1 Introduction As it was discussed in previous chapter, the circuit performance deteriorate due to many factors. One of the reasons motivating this research is the evident domination of manufacturing uncertainties to the production yield loss. Moreover designing IC for the new submicron technology scale, the physical based device models became more complicated, which make hard to find explicit relation between circuit performance and system variability. We mentioned the necessity of somehow pushing the envelope of some design parameters. Since some performance metrics is critical for the overall performance quality usually do not tolerate any deviations from designer specifications. For example in analog circuit design, the signal integrity is an important design requirement, where insufficient signal bandwidth or low signal-noise ratio due to manufacturing flaws can jeopardize the entire system. In digital VLSI integrated circuit timing and power related measures are key design requirements. To make the situation worst, in the nanometer regime, many physical effects such as process variations and transistor leakage that previously had little impact on timing and power, are becoming increasingly important to these metrics [21]. To incorporate these emergent issues into circuit analysis, accurate and efficient gate models that consider timing, power and their variability are critical for standard cell library characterization. Existing library characterization relies on SPICE simulation with compact transistor models (e.g., BSIM or PSP). Though these models are physical and scalable, they use a large number of parameters and equations, require a lot of iterations at each stage of the simulation, and thus limit their efficiency in today’s VLSI design flow. In this situation, gate models considering both power and timing with sufficient efficiency and accuracy are in high demand. The following

27 review shows that existing gate models only consider one aspect of the performance, i.e. either timing or power, but not both. One of the recent advances in parametric waveform modeling is K-factor models solely used for fast gate timing analysis. Assuming capacitive load the characterization of the gate output delay td and the gate output transition time tf and tr is a function of the input transition time tin and the output capacitance Cl. These performance measures are fitted in so called k-factor equation in the form td,r,f = K(tin , CL ). Further in [16] the model is extended by combining the advantages of the k-model with switching resistor models. The second order driving admittance is modeled with equivalent π-model. Finally the model indexes lookup tables by input slew load capacitance and effective load capacitance in [23] and [37], to capture the resistive shielding effects. The iterative evaluation converges to accurate solution suitable for modern submicron devices. In general many of the waveforms shapes appear to resemble the cumulative distribution functions (CDF) of probability density function (PDF) such as Gamma or Weibull distribution. Several methods were proposed, as in [7] the fitting process determines the distribution parameters. To improve accuracy for gates with highly nonlinear devices and signal propagation, a current source models (CSM) has been proposed. CSM accommodates arbitrary input waveforms and loads to provide complete output wave-form, rather than just delay and output slew. The efficient model characterization is carried out through series of simulations and in general solving differential equations. For example Blade and Razor described in [15] are waveform reconstructing method depending on the input and output voltage.Blade cell model works in two phases. First it models the output waveform exclusively from DC derived I-V tables. In the second step it applies adjustments to capture the effects of aggregate internal capacitance. The interconnect modeling package Razor uses the results derived from Blade and characterize the effect of interconnect linked to the gate output. Moment matching techniques are used for derivation the output waveform. Since the current source is derived from DC characteristics and assumes linear capacitance at the output this model is limited to

28 certain applications. The man difficulties are coming from the requirement to deal with nonlinear loads, as well as the presence of cross capacitance effects. However, to describe the effect of the load, one requires a nonlinear capacitance. In [28] is proposed a 2-stage Waveform independent Model. The input stage is modeled with a second order RC network representing the internal circuit dynamics from the input of the gate to a certain dummy node. The current source and capacitor is controlled by the input and output stages. This particular capacitor captures the nonlinear effects at the output. Unfortunately, these techniques are not able to consider complex input waveforms, slew rate, cross capacitances, nonlinear capacitances and process variations simultaneously. The multiport current source models address the problem of multiple input switching. The input and output ports of the gate are studied independently as each associated node is modeled with dependent current source and charge distribution. The approximation of charge current models requires several transient simulations. The aim to combine several performance measures in one compact model motivated the development of robust gate modeling scheme that evaluates both timing and power metrics. Different from the existing standard cell models where leakage is a given constant value, the proposed gate models predict the leakage and timing with a set of finite points.The robust finite point model is developed by studying the static and dynamic circuit characteristic of complementary gates. Current source model sufficiently can capture the steady state behavior and charge distribution parameterizations at each model node elaborates the model in real world application. The Current Source Model (CSM) discussed in [5] and [6] is widely accepted in the IC industry. CSM has proved to enhance accuracy through better characterization by choosing more time points. The key idea of finite-point transistor based model (FPM) is to extract only a finite number of data points from I-V curve based on their physical meanings and the importance in the circuit operation. The dependence on process and design variations is embedded in these points. The entire behavior of I-V is then extrapolated from these points with simple polynomial equations. FPM [17] describes the circuit operation in I-V and C-V characteristics of

29 the transistor. Here we focus on I-V model of FPM, which we later extend to a gate FPM model. The folloing is a brief description og Finite Point Model. We pick a few points based on the physical meaning of the transistor behaviors. Then we can construct the I-V curve from these critical points. In general, a three argument variable notation is adopted to describe these FPM points: X(VGS , VDS , IDS ). Figure 2.1 shows a set of three points used to construct I-V curves. These three points, A(VDD , VDD , IOn ), B(VDD , VLin , ILin ), and C(VDD , Vstart , Istart ), are of major importance as they define the operating region of a transistor. Point A is identified by letting VGS = VDD and VDS = VDD , and measuring the maximum current denoted as Ion. Point B shapes the boundary between saturation and linear region. Following Sah equation cited in [4], we identify point B as the saturation-triode region corner point. Point B can be located by setting VGS = VDD and use Sah equation to get VLin and ILin . For gate FPM model, we can apply the similar strategy to locate point A and B. Point C ensures the linearity between IDS and VGS in the saturation. Since this point is not used in gate FPM model, interested readers can find the derivation of this point in [31]. The close-form piece-wise I-V model is then developed based on these critical points and the associated regions to establish relationship between current and a set of finite points: IDS = IDS (X, VGS , VDS ) . For example for linear region the governing equations are described as: IDS = J[

ION − ILIN ]VDS + K(VGS − VST ART )(M (VLIN − VDS )2 + ILIN ) (2.1) VDD − VLIN

, and saturation region is modeled with: IDS = J[

ION − ILIN ]VDS + K(VGS − VST ART )ILIN ) VDD − VLIN

(2.2)

,where J, K, M are coefficients. Their derivations are described in [31]. Both 2.1 and 2.2 have the same first term which is used to keep first order differential continuous at the boundary of linear and saturation regions, and to model short channel effect and Drain Induced Barrier Lowering (DIBL) effect in saturate region.

30 2.2 Finite Point based CSM model - Static Parameterizations 2.2.1 The choice of finite points A B

IDC

Vi=VDD

Ai

Bi

Vi=Vth

C

Vo

Figure 2.1: I-V family of curves with Finite Points The robust CSM of multiple input gates includes the effect of process variations, complex input and output waveforms, and nonlinear capacitances that are generally neglected by other existing models. Use static complimentary CMOS design as an example. The complex gate has a pull-up and a pull-down network: PUN and PDN. Naturally a single model represents the CSM for PUN and another model the CSM for PDN. CSM consists of a dependent current source, input capacitance and output capacitances. The current source depends on input and output voltages. The input capacitance models the coupling effect between the input and output stage while the output capacitance models the load impact on the gate.

In steady state the model

behavior is depicted by the dependent current source, where IDS = IDS (VI , VO ). As mentioned before the entire PDN or PUN is regarded as three terminal devices with nodes VI , VO and GN D. Therefore for multiple input gates, separate models must be provided according the input pattern. The following discussion will consider PDN, where the similar analysis would take

31

IDS, PDN

60

40

20

0 0

0.2

0.4 0.6 VGS,[V], VDS=VDD

0.8

1

Figure 2.2: Finite Point A related to VGS place for PUN. Since the simplest gate is the inverter, the PDN is a single nMOS transistor. Generally three terminal model for a single transistor corresponds to common source configuration. The necessary insight motivating the FP model can be provided as following. Refereing to figure 2.1,where a typical family of IDS − VDS curves for a nMOS transistor are shown, lets consider a single curve VGS = const. Clearly there are two distinguished regions for the drain current - triode region and saturation region. Moreover if the delimiting point - Bi and Ai are known, the current can be easily characterized with linear function in saturation and with quadratic function in triode region with parameters Bi and Ai . Now if let arbitrary input VGS = VGSi , obviously the input voltage would affect the parameters Bi = Bi (VGSi ) and Ai = Ai (VDD , Ix(VGSi )) = Ai (VGSi ). Hence the entire family of static current characteristics can be modeled as relation to two defining points as following, where we discard the index i for clarity and VGS and VDS are variables: • Triode mode: IDS = VDS (

VSAT (VGS ) + VSAT − VDS (VGS )) ISAT (VGS )

(2.3)

32 • Saturation mode: IDS = VDS ∗ slope(VGS )

(2.4)

As the equations above suggest the relation of points Bi and Ai to the input voltage (VGS ) is crucial for the final parameterizations. Looking at the 2.1 the upper bound for these points occurs at VGS = VDD . By letting A = M (VDD , Ix(VGS = VDD )) and B = L(VSAT (VGS = VDD ), ISAT (VGS = VDD )), we define as finite points for the device. When the input voltage sweeps from VDD to zero, the current coordinate of points A and B gradually decline to zero, as well as the voltage coordinate of point B. On figure 2.2 are shown the relation of Ix = Ix(VGS ) for a nMOS transistor. this component is exactly the current coordinate of point A, projected for different input voltage VI . Straightforward approach is to apply piecewise linear approximation of this curve as it would avoid additional complication in general case approximation. As it will be shown later for different complementary gate this current indicator can display strong nonlinear behavior. 2.2.2 Finite point extraction The approximation of point B = B(VGS ) is subject of additional consideration. First it requires certain assumption how to make this choice. For a long channel device as the output voltage VDS rises enough to create an inversion layer, the drain current became unrelated to the drain voltage (disregarding the channel-length modulation). For short channel device the saturation point occurs at lower values as in [20]. With clear physical meaning the saturation point B is not trivial to extract out of device characteristics. However this apparent disadvantage, gives an opportunity for certain design tradeoff, as different methods of extracting take place. For example on figure 2.3-a are shown the well known Sah low, valid for long channel device. In this case the VL = VGS − VT H and IL = k/2(VGS − VT H )2 . Since the ultimate goal of fitting the drain curves shown in 2.1 as accurate as possible, a simpler approach is possible. Instead of using Sah equation, point B may be defined simply as a point, where dIDS /dVDS takes certain constant values. This case is shown on 2.3-b.

33

IDC

IB≈(VGS-VTH)2

A

A

IDC

B

B Bi C Vo

a IDC

dI1 dI1 = dVDS dVDS

Bi

Vo

b IDC optimal Bn

Ai I1≈VDS2

C

Ai

B1

I2≈VDS

c

Vo

d

Vo

Figure 2.3: The choice of point B: a-Sah equation, b-same slope, c and d-Heuristic method Another approach is to take linear dependence of the saturation point with respect the input voltage VGS . This choice simplify the IDS equations and it is stated as following: • For linear region: IDS = ISQ −

ISQ (VDS − VLin )2 VGS (IOn − ILin ) + 2 VLin VDD (VDD − VLin )

(2.5)

34 • Saturation region: VGS (IOn − ILin ) IOn − ILin − VDD (VDD − VLin ) VDD − VLin IOn − ILin = IX − VGS VDD − VLin VGS (IOn − ILin ) = ISQ + VDD (VDD − VLin )

IDS = IX + VGS

(2.6)

ISQ

(2.7)

IDS

(2.8) (2.9)

As a final step is applying a heuristic approach, aiming to minimize the overall approximation error. On 2.3-c is shown a sample choice of Bi . To ensure proper fitting, two requirements must be met. First we seek constructing IDS with linear I2 and quadratic I1 component. In addition the curve at the boundary point Bi must be smooth. Hence by considering a finite set of points Bi ∈ B1 , ..., Bn , the approximation would yield different error. For example the choice of B1 on 2.3c maintain large error due to mismatch at the saturation region, in contrast to mismatch at the triode region, when the choice is Bn . This strategy has two main advantages. First it provides smooth regression through saturation point. Secondly, the heuristic approach ensures the maximal accuracy. Algorithm 1 The best saturation point algorithm 1: procedure BestPointB(V , I, vM IN , vM AX ) 2:

for all v ∈ (vM IN , vM AX )) do

3:

V ds >= v : linear, I2 (V ds)

. Fit linear polynomial

4:

V ds = v : dI2 /dV dslinear = dI2 /dvdsquadratic

. Match the derivatives

5:

V ds

Suggest Documents