and TSOI' faults. In contrast, FF2 leads to FC = 94.3%, with partial coverages of 76.2, 95.7 and 67.ldfor. LOP, BRI and TSOP faults. This shows that, generatlly,.
Physical DFT for High Coverage of Realistic Faults M. Saraiva, P. Casimiro, M. Santos, J.T. Sousa, F. GonCalves, I. Teixeira, J.P. Teixeira INESC, IST, Apartado 10105, 1017 Lisboa Codex, Portugal
Keywords: Design f o r Testability, Fault Modelling, Layout Reconfiguration, Testability Analysis.
Abstract Test quality requires the ability of test patterns to cover realistic faults originated by physical defects induced during IC manufacturing. Recent progress in a methodology for physical testability analysis is reported in this paper. A refined bridging faults classification provides evidence that reconvergent fan-out areas should be carefully designed to avoid hard to detect faults. Moreover, the concept of selective decompaction is introduced, t o show that with reduced area overhead, testability can significantly be increased. As a result, guidelines for cell library development, and for refined routing algorithms, are presented. The results are il1ustra.ted with several design examples. These examples also show that the realistic fault coverage can be higher o r lower than the Line Stuck-At (LSA) fault coverage depending on the relative incidence of bridging and open faults, and the topological characteristics of the surrounding circuit.
1. Introduction Integrated Circuits (ICs) quality is a must to ensure product competitiveness. Moreover, very low defect levels (on the order of 100 ppm) are specified for manufacturing quality control, which imposes a severe burden on IC test. We refer to Defect Level (DL) the percentage of defective parts that are misleadingly considered as good by production test. In order t o achieve such low defect levels, in a cost effective way, very high LSA Fault Coverages (FC) are demanded in test preparation; in fact, it is reasonable t o expect correlation between DL and FC, and several authors have developed models to highlight such correlation [l-31. However, as physical defects in digital CMOS do not map easily in LSA faults (4,5], one cannot be very confident on the assumed metric for test quality [3,6]. In other words, does FC=lOO% ensures DL=O? If not, how can we estimate DL, prior t o manufacturing? Paper 32.2 642
An accurate evaluation of the defect level requires an evaluation of the realistic fault coverage, i.e., the ability of the test pattern t o cover realistic faults [7]. Recently, a methodology for physical testability analysis was proposed [8-111, using technology information, the most likely physical defects (and their statistics, obtained from the process line), and the IC layout. This methodology enables a realistic fault extraction. For each fault, a Physical Failure Mode ( P h F M ) (e.g., shorts between adjacent metal-:! lines) is assigned, and a probability of its occurrence is evaluated. Then, fault classification, according t o the faults nature (node shorts, or breaks) and topology is carried out. Such classification allow us to predict fault hardness, prior t o fault simulation, i.e., the difficulty of fault detection, by logic voltage testing, using a test pattern derived for LSA fault detection. Hard fault classes, thus, systematically exhibit lower fault coverages than easy t o detect classes. The purpose of this paper is to present new advances in the application of the above mentioned methodology. First, special attention is dedicated t o BRI (BRIdging) faults, since they are a dominant fault in present day CMOS processes. A refined classification of BRI faults is presented and discussed in Section 2. Second, the local and selective application of physical D F T (Design For Testability) rules, for hard fault avoidance, is described in Section 3. In this section, the concept of selective decompaction is introduced, and guidelines for cell library development, and for refined routing algorithms, are presented. Simulation results, in real design examples, are shown in Section 4, highlighting the main results and their impact on testability, reliability and yield. Finally, directions for future work are outlined.
2. Bridging Faults Analysis A significant part of realistic faults (60-95%), in present day digital CMOS processes, are BRI faults. Of these, the most significant share corresponds to LSA faults (1530%) (BRI-1 class, in our classification) and shorts between logical nodes (40-65%), i.e., input/output nodes of Logic Elements (LE). We refer to these as BRI-3 faults. Hence, for this class, a more refined classification is proposed (see Table l), where additional criteria for fault partitioning have been considered:
INTERNATIONAL TEST CONFERENCE 1992
0-81 86-3167-8/92$3.00
1992 IEEE
Fau It Classes De2eclion 1. BRI between PI, PO or other logical nodes and VDDor Vss easy 2. BRH between an electrical (E) . , node and VDDor VSS difficult 3. BRI between logical (L) nodes: 3.1 - Between LEs 3.1.1 - NFBF: 3.1.1.1 - Outside a RFO area easy moderate 3.1.1.2 - Within a RFO area 3.1.2 - FBF: 3.1.2.1 - Outside a RFO area easy moderate 3.1.2.2 - Within a RFO area 3.2 - Within a LE moderate 3.2.1 - Between inputs (NFBF) 3.2.2 - Between inputs/outputs of normal LEs (local FBF) easy 3.2.3 - Between inDuts/outDuts moderate of transm. gaies ( ~ O ~ ~ I - F B F) 4. BRI b etween L and E nodes: 4.1 - Between LEs e%:Y difficult 4.2 - Within a LE 5 . BRI b etween electrical nodes: 5.1 - Between LEs eas;y difficult 5.2 - Within a LE 'I, P O - primary input, output; LE - logic element
---
FBF, NFBF - feedback, non-feedback bridging fault
RFO - reconvergent fan-out Table 1: BRI Fault Classification
0
0
for BRI between LEs (3.1), the presence of feedback loops, and the possibility of shorts occurring within (or outside) a reconvergent fan-out (RFO) area; for BRI within LEs (3.2), the distinction among BRI between inputs, and between I/Os of normal LEs, and of transmission gate LEs.
As it will be shown in section 4 , feedback BRI faults (FBF) are generally well covered. Automatic fault classification of FBF with even, odd or mixed number of inversions was also tried; however, from the detection point of view, there was no relevant difference, in view of their high coverage. Hence, this criterion w a s dropped. The most important issue we found is that the detection of BRI faults within RFO areas is significantly lower than outside such regions, which may be justified by the additional difficulties in fault propagation caused by RFO. Moreover, it was found that some BRIs between inputs of a LE (3.2.1) are undetectable, due to the fact that, the 0 (or 1) dominance inhibits fault propagation. Such
faults are usually dropped, but they pose a reliability problem,
3. Physicial DFT Rules Applicatio,n The application of DFT always introduce overhea,ds, and IC designers are eager to quantify additional 'costs. Layout level DFT is no exception. The application of layout rules for hard fault avoidance generally involves more Si area. In fact, open faults avoidance requires conductive paths redundancy, while BRI faults avoid,ance (in fact, a lower probability of occurrence) quests for increased line spacing. Area overhead has two serious drawbacks: lower yield (thus, increased costs) and, eventually, speed degradation, clue to physica.1 nodes with larger area, and/or longer interconnections. Therefore, to guarantee that physical DFT is feasible (i.e., cost effective), the application of such rules needs to be a minimum. Moreover, the definition of .where and how this will be done needs to be automat able.
As a consequence, the concept of rule violation severzty needs to be introduced. In fact, and unlike what happens in Design Rule Checking (DRC), action should not be taken always, when a given DFT rule is violated. DRC deals only with geometrical dimensions; when layer widths or spacing is outside the allowed range, layout reconfiguration is mandatory. However, physical testability D.RC deals with geometrical dimensions, topological characteristics of faults, their correspalndlent class incidence and the physical defect originating them. Class Faullt Incidence [8] is a parameter that quantifies the relative importance of a set of faults, as compared with the whole realistic fault set. It can be defined as Definition 1 - Weighted Class Fault Incidence - Given a circuit with a known transistor-level description, and a realistic fault set F , containing fault classes i , the weighted (class fault incidence, F I i , associated with a fault class i, is given by
where Mi is the number of listed faults belonging to class i , F, is the total number of realistic faults in F , and pj are the probabilities of occurrence of the listed faults. Similarly, it is possible to define F I i , j , the fault incidence associated with a given fault class, i, and a. given PhFMj. Of coiurse, x y = l F l i , j = F I ; , where R is the total number of different PhFMs originating faults in class i . A s shown, F l i is not the simple percentage of class i faults, but a percentage weighted by the relative probability of the realistic faults occurrence. This is important, in terms of realistic fault coverage, since, e.g., for metal-1 shorts, 15% of the extracted faul1,s irnay correspond t o F'&=50% [13].
Paper 32.2 643
A severe rule violation corresponds t o a situation for which, cumulatively, a certain geometrical situation occurs (e.g., small distance between adjacent metal-1 lines), a certain topological situation occurs (e.g., BRI2, or shorts between a power line and one internal node of a LE), a given PhFM is responsible (i.e., an interconnection failure, not a device failure) and a relevant class incidence occurs (e.g., Fli=5%). In our proposed methodology, circuit and fault extraction are followed by fault classification. Hence, non-empty hard fault classes are identified. Rules t o increase the physical testability are associated with a certain class, and a certain P h F M . Since we would only reconfigure the layout for a relevant gain in realistic fault coverage, then a measure of rule violation severity can be introduced as
Detection
Fault Classes 1. LOPs in power supply lines: 1.1 - Between LEs
easy
-
gates from power lines 3. LOPs disconnecting paths from power, logical, elect. nodes:
difficult
PU/PD networks of a LE 3.2 - disconnecting partial
Definition 2 - Rule Violation Severity - Given a circuit, with a known physical layout, and a realistic fault set F , the Rule Violation Severity RVSi,j, associated with a given fault class i , and a given P h F M j , is evaluated by RVSi,j = aj (1 - FCi)FIi,j (2) where FCi is the class fault coverage, aj = l ( 0 ) for interconnection (device) failures, and FIi,j is the incidence of faults belonging t o class i and caused by PhFMj.
P U / P D - pull-up/pull-down (networks) LE - logic element Table 2: LOP Fault Classification
Fault Classes
I
Detection
I
The aj parameter is introduced, since device failures cannot be avoided by layout reconfiguration; hence, there is no point in assigning a non-zero value for the correspondent RVSa,j. From (2), it can be seen t.hat RVSj,j < F I i , j < 1.
Open fault avoidance has been reported previously [11,14-171, and thus, is only referred here briefly. In some cases, it can be implemented without Si area overhead [ll]. In all cases, the physical redundancy intrG duces some defect tolerance, which increases reliability and may increase product yield. Layout inspection should be carried out especially during cell library development, in order t o ensure that hard to detect LOP (Line Open) and TSOP (Transistor Stuck-Open) faults are avoided (see Tables 2 and 3).
Table 3: TSOP Fault Classification
CAD tool development. Physical design’s testability should be monitored in the following aspects:
identification of dificult t o test sub-circuits, such as reconvergent fan-out areas (this may be carried out at gate level). A solution in this case can be cell versioning. Two versions of the same cell should be laied out: the normal version (designed for performance), and the more testable version, with more relaxed design rules (rules t o be used only in critical areas, where hard BRI faults could occur), eventually with larger area. Both versions should be designed with the same power supply rails and cell height dimensions. The version of the cell that must be instanciated differs, whether the cell is to be included in a RFO area, or not. Circuit analysis, a t gate level, can identify the LEs which are within RFO areas, and thus guide the instanciation. Experience will have t o determine whether all or just some of the cells require two versions.
Bridging f a d avoidance needs to be judiciously implemented, since just relaxing design rules would introduce an unacceptable area overhead. The introduction of *physical DFT rules for BRI fault avoidance needs to be carried out both a t cell implementation and at cell routing levels. In fact, one may avoid by careful design, e.g., a hard to detect BRI between cell 1/0 nodes, a t cell design, but then, by cell routing, make it again likely. Moreover, the same cell may be easier or harder to test, depending on the circuit topology. In particular, a cell within a RFO area is more difficult to test for the realistic faults.
As a consequence, a new design philosophy should be implemented, which influences design methodology and Paper 32.2 644
L.
identification of non-conventional (i.e., non full CMOS) circuit implementations of logic blocks. Such
physical designs are usually less testable than full CMOS; hence, their use should also be restricted. These are cells for which two versions may be especially important. When they are instanciated to be included in a RFO area, the comments above apply. 3. identification of the fault hardness of faults associated with the routing patterns. Such patterns are usually long, as compared with the patterns of the cells; thus, their incidence is very significative. In particular, the routing connecting cells within RFO areas may introduce hard faults. Therefore, they should be routed with more relaxed spacing between adjacent conductive lines of the same material. This leads to the possibility of developing a new generation of routers, with testability constraints.
4. identificatzon of wasted areas. After cell routing, layout inspection usually reveals very compact areas, and areas where sparse patterns are drawn. This means that, without increasing the Si area, it is possible to selectzvely deconapact the interconnection lines. Thus we not only decrease the likelihood of occurrence of BRI faults (increasing testability), but also increase yield, since, for the same dlefects statistics, defects will be less likely to introduce a physical failure. The management of the available area should not be performed uniformly, but preferably, hard BRI (associated with specific conducting lines, a priori identified) should be avoided. Again, the development of a new generatzon of C A D tools, namely for selectzue decompactzon, will be reqluired.
4. Simulatioii Results aiid Discussion A significant number of designs has been analysed, using different technologies, defects statistics and layout, styles. Here, some examples are included to demonstrate the major conclusions of this work. First, a set of examples are presented, of a physical implementation of boundary scan circuitry, from an eurcpean manufacturer. This circuit, referred as BST, has at TAP (Test Access Port) interface, 14 input and 8 output, cells, laied out with arround 2,000 transistors building 700 LEs. A test pattern of 1100 test vectors generated for LSA fault coverage was used to evaluate the reahstatfault coverage, FC,. The gate level LSA fault coverage, F C , is FC=92.1%. Two defect statistics were used: FFl, a failure file where open faults were given a considerable importance, and FF2, a failure file for which a much more stronger influence of shorts was implemented. FF2 is closely based on the defects data of a real 2-metal CMOS process line. The results are shown in Figs. 1 and 2 , where the fault incidences, according to the assumed P h F M s and fault
classes, anid the realistic class fault coverages are depicted, respectively. FFl leads to FC, = 87.4%, with partial coverageslof 80.1, 92.8 and 71.9% for LOP, E3RI and TSOI' faults. In contrast, FF2 leads to F C = 94.3%, with partial coverages of 76.2, 95.7 and 67.ldfor LOP, BRI and TSOP faults. This shows that, generatlly, open faults tend t o be harder to detect by L S A t e d sets than BRI faulis, which may pose problems especially for testing during the product lifetime, since phenoimena like electromi,gration tend t o increase the likeliness of open faults. Moreover, it shows that the realistic fault coverage can be hzgher o r lower than the L S A fault coverage, depending on the relative incidence of bridging and open faults. and the topological characteristics of the surrounding circuit. As a consequence, the use of FC as a metric of test, quality, in particular for defect level evaluation, is an unreliable metric. In present day CMOS processes, where BRI faults domiinate, for FC=100%, FC,