Almost sure diagnosis of almost every good element - IEEE Computer

0 downloads 0 Views 163KB Size Report
This is an overview of a technical report by the same title, McGill Department of ... Our presentation focuses on the digraph model, but our results apply to the.
Almost Sure Diagnosis

of Almost Every Good Element Larry LaForge, Iiaiyuan Huang, and Vinod I i Agarwal VLSI Design Laboratory, Department of Electrical Engineering McGill University Room 633, McConnell Engineering Building 3480 University Street Montreal, PQ Canada H3.4 2A7

telephone (514) 398-7074 This is an overview of a technical report by the same title, McGill Department of Electrical Engineering. Complete details have been submitted to the I E E E Transaclzons on Computers, but the manuscript exceeds by thirty the four page limit of this workshop.

Keywords: system level dzagnoszs, probabalzstzc dzagnoszs, test dzgraphs, constant-degree dzagnoszs, self testzng archztectures, wafer-scale testzng. In the interest of practicality suppose that we want to diagnose the dice on a semiconductor wafer, and that the processing element on each die contains logic that can test another die. The test circuitry is simplified when each d i r ~is the same, and this is an assumption of our model. By using a large number of randomized test vectors we can, with high probability, decide whether an element is faulty [Muradali 1990). This situation is approximated by [Preparata, Metze, Chien 19671, wherein 1) the results reported by any good element are presumed to be valid and 2) faulty elements may lie or tell the truth. We adopt this model, basis of which is a test dzgraph on the set of N elements: ( U , w ) is an arc if and only if element U tests element w . Now suppose that tests are assigned among t h e plemmts so t h a t the number of tests performed by or on any two elements differ by a t most 1; that is, the underlying digraph is regular or nearly so. Imagine that circuitry on the wafer

0-8186-2457-4191$1.00 0 1991 IEEE

191

192

1991 International Workshop on Defect and Fault Tolerance on VLSI Systems

LaForge, Huang, and Agarrval

accurately reports the outcome of all tests between elements. The ensemble of these pass-fail rrsults is known as a syndrome and serves as the input to a diagnosis algorzthnr The algorithm executes on a fault-free Turing machine equivalent and may contain prior knowledge of the underlying digraph. If pairs of elements test each other then the underlying digraph may be viewed as an undirected graph. By considering only whether pairs of elements declare each other good the digraph model specializes to a c o m p ~ r i ~ modrl. on Under the latter we presume fault-free comparison circuitry that reports whether, for identical test input, the output of pairs of elements is the same Our presentation focuses on the digraph model, but our results apply to the comparison model as well. Our digraph should have a structure that is readily implemented on the wafer. We would also like to minimize 1) the time it takes to compute a syndrome, 2) the circuitry required to report the syndrome, and 3) the time it takes for our diagnosis algorithm to identify the good elements. Goal (1) is facilitated by the use of built-in self test instead of probe test. T h e condition of regularityassists in achievinggoals (1) and (2). Goals (2) and (3) are most easily met when we minimize the test redundancy. The latter IS defined as the number of tests per element and IS the theme of this as well as previous works. What are necessary and sufficient conditions such that any set of t or fewer faulty elements (whose test reports may be anything) can be identified with certainty? This is the problem of worst-case dzagnosis. Necessary conditions are given by [Preparata, Metze, Chien 19671: 1) the number of faulty elements must be less than the number of good elements; 2) the worst-case t diagnosability cannot exceed the minimum in-degree of any vertex. Furthermore, the authors construct t-diagnosable digraphs containing an optimum number t N of arcs. If, for constant p , we let t = p N then the niiniher of tests is a t least p N ’ ; ze, the worst-case test redundancy p N is lznear in the number of elements.

Test2

193

Almost Sure Diagnosis of Almost Every Good Element

By contrast, [Scheinerman 19871 and [Blough 19881 study probabilistic diagnosis under a binomial distribution of faulty elements. For any regular digraph, R ( N log N ) arcs are necessary such that, with probability approaching 1, every good element can be correctly identified. [LaForge 19911 moreover matches this bound with a deterministic structure: every element in an N-element array with local spares is almost surely diagnosable at @(log N ) test redundancy. As with worst-case diagnosis, the number of arcs depends as well on the element fail rate. Also, condition (2) above remains in effect: on average no more than half the elements may be faulty. The test redundancy of worst-case diagnosis is linear in the count of elements. If instead we employ almost sure diagnosis of every good element then the test redundancy reduces to a logarithmic factor. Our work shows how, by relaxing the requirement that all good elements be identified, we can accommodate a number of tests that is proportional to the number of elements. The maximum degree of the corresponding test digraph is bounded by a constant that is arguably practical. With high probability all but a specified fraction of good elemenh can be identified. Under modest assumptions about the behavior of faulty elements our results pertain even when the element fail rate is greater than 5 . Diagnosis with constant-degree digraphs is of practical interest to wafer-scale testing. In lieu of wafer probe we suggest the use of self-testing chips whose test arcs follow switchable interconnect laid along the scribe lines. Diagnosis of almost every good element is appropriate when it is acceptable for the manufacturer to throw away a small fraction of the good elements, but i t is unacceptable for the manufacturer to package and sell a faulty chip. Our results are new to the theory of system level diagnosis. Our approach offers an alternative to the testing of chips that have been probed or packaged. We suspect that this approach may be economically advantageous, and suggest that implementation of our proposed heuristic would be a worthwhile experiment.

194

1991 International Workshop on Defect and Fault Tolerance on VLSI Systems

LaForge, Huang, and AgarwaJ

References [Blough 19881 Doiiglas hf Blough. Fault Detection and Dzagnoszs zn hidtzprocessor Systems. PhD thesis. Baltimore: Johns Hopkins University, 1988. [LaForge 19911 Laurence E LaForge. Fault Tolerant Arrays. PhD thesis. Montreal: School of Computer Science, McGill University, hfay 1991. [Muradali 19901 Fidel Muradali. A New Procedure for Wezghted Random Bud-zn Self Test. PhD thesis. Montreal: Department of Electrical Engineering, McGill University, March 1990. [Preparata, Metze, Chien 19671 F Preparata, G Sletze, and R Chien. “On the Connection Assignment Problem of Diagnosable Systems”. IEEE Transacttons on Computers. EC - 16. 1967. pp S48-854. [Scheinerman 19871 Edward R Scheinerman. “Almost Sure Fault Tolerance in Random Graphs”. SIAM Journal of Computzng. Vol 16, No 6, December, 1987. pp 112441134,

Diagnosis of syndrome based on the largest connected component of the corresponding agreement graph.

Suggest Documents