ATPG PADDING AND ATE VECTOR REPEAT PER PORT FOR REDUCING TEST DATA VOLUME Harald Vranken 1, Friedrich Hapke 2, Soenke Rogge 2, Domenico Chindamo 3, and Erik Volkerink 4 1
2
3
4
Philips Research Digital Design & Test Prof. Holstlaan 4, WAY-41 5656 AA Eindhoven The Netherlands
Philips Semiconductors Design Technology Center Georg-Heyken Strasse 1 D-21147 Hamburg Germany
Agilent Technologies DfT Center of Expertise Via del Tintoretto 200 00142 Rome Italy
Agilent Technologies Systems and Solutions Lab. 3500 Deer Creek Road Palo Alto CA 94304-1392 USA
[email protected]
[email protected] [email protected]
[email protected]
[email protected]
Abstract This paper presents an approach for reducing the test data volume that has to be stored in ATE vector memory for IC manufacturing testing. We exploit the capabilities of present ATE to assign groups of input pins to ports and to perform vector repeat per port. This allows run-length encoding of test stimuli per port. We improve the encoding by filling the don’t-care bits in the test stimuli, such that longer run-lengths are obtained. We provide a probabilistic analysis of the performance of vector repeat per port with various ATPG padding types. We further discuss the impact of ATE architectures. The paper provides experimental data for a set of large industrial circuits, which shows an average reduction of the test stimulus data volume by a factor of 13.
1. Introduction Advances in design methods and process technologies are causing a continuous increase in the complexity of integrated circuits (ICs). IC process technologies provide the capability to integrate more and more transistors on a single IC, and core-based design methods allow the design of large SoCs (systems-on-a-chip). The test data volumes for IC manufacturing test are currently increasing even faster than the numbers of transistors on ICs, since new design styles and defect types require additional test patterns to sustain high test quality. ICs with multiple clocks running at different frequencies require different test methods than ICs that have only a single clock source. Nanometer process technologies introduce new defect types like resistive opens that affect the IC timing behavior, and hence require additional test methods like delay fault testing. The increase in test data volume causes a number of critical problems related to ATE (automated test equipment) utilization, which has a serious impact on test cost and test quality. These problems are particularly associated with
data upload time, test application time, and ATE vector memory. Large test data volumes result in long upload times for transporting test data from a workstation to the ATE memory [29]. This upload time often ranges from tens of minutes to hours. Since the ATE remains idle during this upload time, the ATE utilization is reduced. When executing an IC test, test stimuli and responses are exchanged between the IC and ATE. However, the I/O bandwidth between the IC and ATE, as defined by the number of IC pins/ATE channels and their data rates, is not keeping pace with the increasing test data volume. This bandwidth bottleneck leads to an almost exponential increase in test application time, which significantly increases the costs of test [15]. The ATE vector memory is limited, and hence an increase in test data volume may exceed the size of the available memory. One way to deal with this is to truncate the test data: only the test data that fits into the ATE vector memory is actually applied, while the remaining test data is skipped. This however implies reduced test quality and is therefore in general not acceptable. An alternative is to split the test data into several sets, such that each individual set fits into the ATE vector memory. The IC test is then performed in multiple stages and in each stage the ATE vector memory is reloaded with a new test set. This however increases the overall test time and reduces the ATE utilization. Another alternative is to upgrade the ATE by extending the vector memory. However, vector memory in traditional ATE is expensive and the ATE architecture imposes an upper limit on the memory that can be added. An even more expensive solution is to install new ATE that offers more vector memory. It is clear that none of the above solutions is preferred for dealing with increasing test data volumes. In this paper, we propose an alternative approach for reducing the test data volume that has to be stored on the
ITC INTERNATIONAL TEST CONFERENCE 0-7803-8106-8/03 $17.00 Copyright 2003 IEEE
Paper 41.2 1069
ATE. Our approach is based on two key ideas to compress test stimuli: (1) exploiting the ATE capabilities for vector repeat per port, and (2) exploiting the large number of don’t-care bits in test stimuli. Most present ATE support vector repeat, which allows to store a sequence of n identical test vectors on the ATE by storing only a single test vector plus an instruction to repeat this vector n times. When n is sufficiently large, this considerably reduces the test data volume when compared to the traditional approach in which all n vectors are stored. The concept of vector repeat corresponds to run-length encoding for a sequence of test vectors, where each codeword consists of a single test vector and its run-length. We improve the efficiency of vector repeat by applying vector repeat per port, where a port corresponds to a group of pins. Vector repeat per port yields longer run-lengths and hence larger test data reduction, when compared to the traditional approach of vector repeat on all IC pins. We improve the efficiency of vector repeat further by tuning the test vectors. It is well known that the percentage of care bits in test sets for large ICs is typically in the range of only 1 to 5%. Traditionally, the don’t-care bits are filled with random values during ATPG, which is referred to as random padding. We maximize the number of times that a vector can be repeated by modifying the ATPG padding. The paper contains the following novel contributions: (1) we introduce the concept of vector repeat per port; (2) we analyze the performance of vector repeat per port when using various ATPG padding types; (3) we discuss the impact of ATE architectures on vector repeat per port; and (4) we provide experimental data on a set of large industrial circuits. The sequel of this paper is organized as follows. Section 2 describes prior work on test data compression. Section 3 presents an analysis of ATPG padding with vector repeat per port. Section 4 discusses ATE architectures and their impact on vector repeat per port. Section 5 presents experimental results, and Section 6 concludes the paper.
2. Prior Work The use of ATE vector repeat for reducing test data volume has been proposed previously in [1][22]. The work in [1] also enhances the ATPG padding to improve the efficiency of vector repeat. However, both papers use the traditional approach to vector repeat on all input pins and hence do not apply vector repeat per port, which gives less optimal results. In recent years, a large number of publications appeared on test data compression techniques that rely on DfT (designfor-testability). Most techniques exploit the large fraction of don’t-care bits for compressing test stimuli. The test stimuli are stored on the ATE in a compressed format, and
Paper 41.2 1070
are transported as such to the IC. The IC contains on-chip decompression circuitry to decompress the test stimuli and feed a large number of scan chains. Prior work on test data compression can be classified in several categories. The most efficient techniques rely on LFSR (linear-feedback shift register) reseeding, in which the care bits are encoded as LFSR seeds. During test execution, the seeds are loaded into the LFSR from the ATE, while the don’t-care bits are filled pseudo-randomly by the LFSR. The original work in [17] uses static LFSR reseeding. Improved approaches apply a multiple-polynomial LFSR [9], an LFSR with variable-length seeds [25], or multiple LFSRs in virtual scan chains [14]. The best results so far are obtained with dynamic LFSR reseeding [18][19][20][27]. Besides LFSRs, also other hardware structures have been proposed, such as XOR-networks for scan chain concealment [2] and folding counters [21]. Other approaches rely on adjusting the scan chains, such as Illinois Scan [8] and RESPIN [7]. Another category of test data compression techniques relies on coding techniques to encode test stimuli, using either Huffman codes and statistical codes [11][12][13], runlength codes [12], alternating run-length codes [6][10], Golomb codes [5], frequency-directed run-length (FDR) codes [4], or packet-based codes [16][28]. In most coding techniques, the don’t-care bits are filled to improve the encoding efficiency. The test data compression techniques as described above are very effective for reducing both test data volume and test application time. Most techniques require on-chip decompression circuitry, which implies some additional silicon area, a more complicated design flow, and the need for proper tool support. Other techniques require modifications of the ATE or additional off-chip hardware. Coding techniques that encode variable-length blocks of test data or use variable-length codewords, require synchronization between the IC and ATE which is hard to realize in practice. Our approach avoids most of these disadvantages, since it does not require DfT for stimulus compression. Our approach can further be realized with present ATE, and hence it does not rely on capabilities that will only be supported by future ATE. Although our approach may not outperform prior approaches in terms of compression ratio, it is much easier to apply in practice. The concept of ATE vector repeat may furthermore also be applied for ICs that do contain DfT for test data compression, to further compress the data to be stored on the ATE. Vector repeat is less efficient for compressing test responses. It is therefore preferred to compact test responses using e.g., a MISR (multiple-input signature register) [1] or compaction logic [24]. Built-in self-test (BIST) reduces the amount of test data to be stored on the ATE to an absolute minimum, since the ATE only has to provide the control sequence to start the
BIST and to capture the pass/fail result or a signature. BIST implements on-chip generation of test stimuli and analysis of test responses. BIST for embedded memories is generally accepted nowadays. The use of logic BIST is increasing also, but it is generally not considered as the preferred solution for low-cost manufacturing test [26], mainly due to its impact on design flows, required design modifications, and silicon area.
3. ATPG Padding with Vector Repeat 3.1 Vector Repeat Per Port We introduce the concept of vector repeat per port, which means applying vector repeat per groups of pins. A port of size k refers to a group of k pins. Let m denote the number of ports, and pi the size of port i (1 ≤ i ≤ m). The size of a port can range from a single pin to all IC pins. Hence, for an IC with n pins, the n pins are assigned to the m ports such that ∀1≤i ≤m 1 ≤ p i ≤ n and ∑1≤i ≤ m p i = n . Vector repeat per port can mean one of the following: •
Repeat-per-pin. In this case there are m = n ports and each port contains only a single pin ( ∀1≤i ≤ m pi = 1 ).
•
Repeat-per-pin-group. In this case there are m ports (1 < m < n) that contain one or multiple pins. The number of pins may vary per port ( ∀1≤i ≤m 1 ≤ pi ≤ n ).
•
Repeat-per-all-pins. In this case there is only a single port (m = 1) that contains all pins (p1 = n). 1
2
3
4
1
2
3
4
1
2
3
4
0
1
0
1
0
1
0
1
0
1
0
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
0
0
0
1
0
0
0
1
0
1
0
0
0
1
0
0
0
1
0
0
0
(a)
(b)
(c)
Figure 1: (a) Repeat-per-pin, (b) repeat-per-pin-group, and (c) repeat-per-all-pins
Figure 1 shows an example for 4 pins and 5 test vectors. Each row corresponds to test vector data and each column corresponds to the bit sequence at one of the pins. For instance with repeat-per-pin (Figure 1a), the sequence at the first pin can be encoded as 0 repeated four times, followed by 1. In repeat-per-pin-group (Figure 1b), the pins are assigned to two ports of size two each. The sequence on the first port can be encoded as vector 01, followed by vector 00 repeated three times, followed by vector 10.
3.2 ATPG Padding in Repeat-Per-Pin It is well known that the amount of care bits in test pattern sets for large ICs is typically between 1 to 5%. The don’tcare bits are usually filled with random values. In principle, the don’t-care bits may be filled with arbitrary values, although the padding may have an impact on the actual defect coverage [23]. In this paper we explore four different ATPG padding strategies: random-fill (fill each don’tcare bit with a random value), 0-fill (fill each don’t-care bit with zero), 1-fill (fill each don’t-care bit with one), and repeat-fill (fill each don’t-care bit with the most recent care bit). An example is shown in Table 1 for bit sequence 1xx0xx. The last column shows how the padded sequence can be encoded using run-length encoding. Assuming the padded sequence starts with 1, then run-length code 1,5 corresponds to a single 1 followed by five 0’s. The example clearly shows that modifying the ATPG padding strategy allows more efficient run-length encoding. ATPG padding random-fill 0-fill 1-fill repeat-fill
Padded sequence 101010 100000 111011 111000
Run-length code 1,1,1,1,1,1 1,5 3,1,2 3,3
Table 1: Padding and encoding of sequence 1xx0xx
The average run-length with repeat-per-pin and various padding strategies can be estimated by the following probabilistic analysis. For a given bit sequence, we define p0 as the probability that a bit is 0, p1 as the probability that a bit is 1, and px as the probability that a bit is don’t care, while p0+ p1+ px = 1. We encode the padded sequence using an alternating run-length code, in which the bit sequence is encoded as an alternating sequence of runs of zeros and runs of ones. The probabilities for runs of zeros and runs of ones of a certain length, both have geometric distributions. The average run-length of the runs of zeros and runs of ones follows from these geometric distributions, and yields the following equations for the various padding strategies: ε random − fill =
2 ( 1 − p 0 + p1 )( 1 + p 0 − p1 )
(1)
ε 0− fill =
1 2 p1 (1 − p1 )
(2)
ε 1− fill =
1 2 p 0 (1 − p 0 )
(3)
p 0 + p1 2 p 0 p1
(4)
ε repeat − fill =
Paper 41.2 1071
For example, it follows from equations (1)-(4) with p0=0.01, p1=0.02, and px=0.97, that the average run-length is 2.0 with random-fill, 25.5 with 0-fill, 50.5 with 1-fill, and 75.0 with repeat-fill. The following observations can be derived from equations (1)-(4): •
The average run-length with repeat-fill is always larger than with 0-fill, 1-fill, or random-fill.
•
The average run-lengths with 0-fill and 1-fill are identical if p0 = p1. If p0 > p1, then the average-run length with 0-fill is larger than with 1-fill, and vice versa.
•
For p0 ≈ p1, the average run-length with 0-fill or 1-fill is larger than with random-fill.
4. ATE Architectures
ε1-fill
εrepeat-fill
50.51 25.50 13.01 6.76 3.65 2.12 1.39 1.08
50.51 25.50 13.01 6.76 3.65 2.12 1.39 1.08
99.97 50.28 25.38 12.95 6.73 3.64 2.11 1.38
25.51 13.01 6.77 3.65 2.12 1.39 1.08 1.01
50.51 25.49 13.00 6.76 3.65 2.12 1.39 1.08
74.98 37.74 19.13 9.82 5.17 2.86 1.74 1.22
5.56 3.05 1.82 1.26 1.04 1.00 1.00 1.00
5.56 3.05 1.82 1.26 1.04 1.00 1.00 1.00
10.00 5.26 2.91 1.76 1.23 1.04 1.00 1.00
Table 2: Average run-length (simulation results)
Paper 41.2 1072
sequencer sequencer
channel 1
(a)
…
sequencer sequencer
channel n
channel 1
sequencer sequencer
…
vectormemory memory vector
ε0-fill
The ability to perform repeat-per-pin, repeat-per-pingroup, or repeat-per-all-pins depends on the ATE architecture. A centralized ATE architecture with shared resources seems attractive for reducing ATE costs. However, this approach often results either in limited ATE capabilities, or in multiple central resources for still allowing sufficient granularity per pin. For instance, a centralized architecture with a common resource for timing generation prevents to run scan chains at different frequencies. In SoC testing, it is however desired that the various cores can be tested at different frequencies during the scan capture cycles for AC scan/delay fault testing, to match the different application frequencies of these cores. Tuning the scan shift frequencies per core furthermore allows for reducing test time.
vectormemory memory vector
The simulation results are shown in Table 2. It can be seen that the average run-length decreases for all padding types
4.1 Shared Resources vs. Per-Pin Architecture
vectormemory memory vector
For repeat-per-pin-group with port-size k>1, probabilistic analysis does not yield simple equations. We therefore rely on simulation to estimate the average run-length with repeat-per-pin-group. We generate a sufficiently large number of random vectors, which are k bits wide, using a uniform distribution for 0, 1, and don’t-care bits, as specified by p0, p1, and px. The don’t-care bits are filled according to a given padding strategy, and run-length encoding is applied to the padded sequence.
vectormemory memory vector
3.3 ATPG Padding in Repeat-Per-Pin-Group
port-size εrandom-fill p0 = 0.01 p1 = 0.01 px = 0.98 1 2.00 2 1.33 4 1.07 8 1.00 16 1.00 32 1.00 64 1.00 128 1.00 p0 = 0.01 p1 = 0.02 px = 0.97 1 2.00 2 1.33 4 1.07 8 1.00 16 1.00 32 1.00 64 1.00 128 1.00 p0 = 0.10 p1 = 0.10 px = 0.80 1 2.00 2 1.33 4 1.07 8 1.00 16 1.00 32 1.00 64 1.00 128 1.00
with increasing port-size. This is as expected, since the probability for having a sequence of identical vectors decreases if the vector-width increases. The average runlength furthermore decreases for all padding types if px decreases. This is also as expected, since reducing the number of don’t-care bits implies less freedom for padding, and hence less freedom to optimize the don’t-care bits for run-length encoding. It can further be seen that the simulation results for port-size k=1 (i.e. repeat-per-pin) match the theoretical run-lengths as given by equations (1)–(4). For both repeat-per-pin and repeat-per-pin-group, repeat-fill gives the best results and yields the largest average run-lengths.
channel n
(b)
Figure 2: (a) Shared sequencer, vs. (b) sequencer per pin
An ATE usually contains a sequencer, which is a programmable processing unit for controlling the test execution. A sequencer instruction is e.g. repeat vector v n to repeat vector v for n cycles. A centralized architecture with a shared sequencer, as shown in Figure 2a, implies that sequencer instructions are issued to all pins simultaneously. This only allows repeat-per-all-pins. For repeat-perpin and repeat-per-pin-group, a sequencer per pin or per
port is mandatory, as shown in Figure 2b. In both figures, each of the n channels/pins has its own vector memory.
vector memory 00
4.2 ATE Ports
11
In core-based SoC testing, a port is commonly associated with a group of pins dedicated to access a particular core. The port concept also allows to group pins that are inactive during a particular test phase. For instance, functional digital pins that are inactive during scan shifting can be grouped. Applying dummy data to these pins (e.g., repeating a ‘break waveform’) during scan shifting allows for saving vector memory. The same is possible for devices with unbalanced scan chains, where the scan pins for shorter scan chains are idle while the longer scan chains are still being loaded/unloaded.
…
Each ATE channel has to be assigned to a port. Typically, the number of ports is smaller than the number of channels. A realistic case is to have 64 ports and up to 1,024 ATE channels. As shown before, the best results with vector repeat are obtained when minimizing the number of pins per port, and a single pin per port (repeat-per-pin) gives optimal results. Since the number of pins usually exceeds the number of ports, it is required to assign multiple pins to most of the ports. Determining the number of ways for assigning n labeled pins to at most n indistinguishable ports corresponds to the classical problem of determining the number of partitions of a set with n elements. The number of assignments corresponds to the so called Bell numbers [3], and increases exponentially with n. Hence, an exhaustive search for assigning pins to ports is not practically feasible for large n, and heuristics have to be used. In this paper, we simply assign pins to ports in an arbitrary way. 4.3 Vector Memory Organization
The vector memory organization has a large impact on the performance of vector repeat. In this paper, we consider two ATE vector memory organizations, as shown in Figure 3 and Figure 4. The vector memory in Figure 3 contains v locations, at address 0 to v-1, and corresponds to an ideal architecture allowing optimal run-length encoding. The sequencer can address each individual vector in the vector memory. For instance, sequencer instruction repeat vector 5 10 repeats 10 times the vector at address 5. A further extension is to repeat a sequence of multiple vectors. For instance, instruction repeat vector-sequence 5 2 10 indicates that the sequence of 2 vectors, at address 5 and 6, should be repeated 10 times. An entry in the vector memory stores an index into the waveform table. The waveform table stores a number of
waveform waveformtable table
v-2 v-2
sequencer sequencer
v-1 v-1 Figure 3: Vector memory addressable per vector
vector memory 0
0
1
…
6
1
7
8
…
13
waveform waveformtable table
… sequencer sequencer w-1 v-7 v-6
…
v-1
Figure 4: Vector memory addressable per vector block
different waveforms that can be applied on an ATE channel. If the waveform table contains e.g. 32 entries, then each vector should be 5 bits wide in order to address an entry in this table. Each pin has its own vector memory and waveform table. The ATE contains either a single sequencer that is shared among all pins, or a sequencer per pin, as indicated in Figure 2. The use of a waveform table that is indexed from the vector memory, offers as advantage that the entries in the vector memory can be less wide when compared to the case where actual vector data and waveforms are stored directly in the vector memory. The waveform table can be updated with new waveforms during test execution, and hence still a large number of waveforms can be defined. The vector memory organization in Figure 4 is more realistic and corresponds to the Agilent 93000 SoC test system. The Agilent 93000 tester has a unique per-pin architecture, and contains a sequencer per pin. The vector memory contains v locations and is organized in blocks of 7 vectors. The blocks are numbered from 0 to w-1. A sequencer instruction should always refer to the first (shaded) vector in a block. For instance, instruction repeat vector 7 10 indicates that the first vector in block 1, at address 7, should be repeated 10 times. Instruction repeat vector 8 10 would be illegal, since address 8 is not the first vector in a block. Each vector now consists of 6 bits: 5 bits to index the waveform table, and 1 bit to flag whether the particular vector is active or inactive. Inactive vectors are simply ignored during test execution.
Paper 41.2 1073
For example, if the fifth vector in a sequence of six vectors should be repeated, then the first four vectors are stored in vector block 0 at address 0 to 3. These locations are marked as active vectors. The remaining vectors in the vector block at address 4 to 6 are marked as inactive. The sequencer contains the instruction generate vectors 0, to issue the vector sequence starting from address 0. The fifth vector is stored at the first location in vector block 1, at address 7, and the sequencer should contain the instruction repeat vector 7 10 to repeat this vector 10 times. The sixth vector is stored at the first location in vector block 2, at address 14, and an appropriate generate vectors or repeat vector sequencer instruction is required. This vector memory organization is less advantageous for vector repeat, since 6 inactive vectors are inserted after each repeated vector. Furthermore, on average 3 inactive vectors are inserted before a repeated vector if it is preceded by a vector that is not repeated. For low repeat counts, it is therefore better to store the vectors without using vector repeat instructions. The sequencers in Figure 3 and Figure 4 have their own program memory to store sequencer instructions. The use of repeat instructions implies that less vector memory is required to store vector data, but more sequencer program memory is required. The size of the sequencer memory imposes an upper limit on the amount of repeat instructions that can be used. When required, the number of repeat instructions can be reduced by using vector repeat only for those vectors that can be repeated more than some minimum number of times. This also improves the relative efficiency of vector repeat, since the test data reduction is larger for repeat instructions that repeat a vector more often. On the Agilent 93000 SoC test system it is therefore preferred to use vector repeat instruction only for vectors that can be repeated at least 16 times.
4.4 Data Queue When executing a test, the ATE should provide a continuous stream of data with appropriate timing to the IC, as defined in the test set-up. The Agilent 93000 therefore contains a data queue as buffer memory. Data is continuously being read from the data queue at a fixed rate and issued to the IC. At the same time, the data queue is being filled with new data from the vector memory. The fill rate typically varies over time due to several reasons. For instance, writing to the data queue is temporarily stalled whenever the ATE is decoding a vector repeat instruction. It should however be avoided that the buffer runs empty. The time for decoding a vector repeat instruction should therefore be compensated, and requires that a vector repeat instruction fills the data queue with a sufficient amount of repeated vectors. This imposes a lower bound on the number of times that a vector is repeated. The actual value of
Paper 41.2 1074
this lower bound depends on the data queue size, and the rates at which the queue is being filled and read. For highspeed testing above 200 MHz, the lower bound due to the data queue may exceed the lower bound of 16 due to the memory organization. However, the impact of the data queue can be neglected for scan testing, since the scan shift cycles are usually applied at much lower frequencies.
5. Experimental Results 5.1 ATPG Patterns and Padding We performed experiments on 11 Philips ICs from various application fields. Table 3 shows the characteristics of these circuits. Column 1 shows the circuit name. The circuits are named PN, where P denotes Philips and N denotes the number of gates. For example, the largest circuit, P2893k, contains 2.9M gates. Columns 2 and 3 report the number of flip-flops and scan chains. Column 5 reports the number of ATPG test patterns, using the padding strategy as indicated in column 4. We used the Philips ATPG tool AMSAL, which generates patterns using static and dynamic compaction techniques. After a pattern is generated, the don’t-care bits are filled according to the specified padding strategy. Fault simulation is performed next with the padded pattern, and all detected faults are dropped. Using a different padding strategy therefore influences the number of patterns. It can be seen that the number of patterns is roughly equal for random-fill and repeat-fill, while the number of patterns increases considerably in several cases with 0-fill or 1-fill. Column 6 shows the average percentage of care bits per pattern, which ranges from 1 to 5%. The percentage of care bits slightly differs when using different padding strategies. It can also be observed that the number of care bits tends to decrease with increasing circuit size, which is mainly due to our ATPG setting. Column 7 shows the difference in fault coverage per circuit, compared to the fault coverage obtained with random-fill. It can be seen that nearly the same fault coverage is achieved per circuit with all padding strategies, although the number of test patterns differs per padding strategy. We further examined the impact of the padding strategy on the quality of the test patterns. Column 8 reports the number of failure sets. A failure set corresponds to a set of stuck-at faults that cannot be distinguished during diagnosis. Hence, all faults in a failure set cause the same fault effects on the (pseudo-)primary outputs. A higher number of failure sets corresponds to higher diagnostic resolution. In column 8, the number of failure sets per circuit with randomly-filled test patterns correspond to 100%. It can be seen that the diagnostic resolution decreases by 10 to 20% when not using random-fill.
circuit
flip-flops
scan chains
P141k
11k
24
P267k
17k
45
P269k
17k
45
P279k
18k
55
P286k
18k
55
P330k
17k
64
P388k
24k
50
P418k
29k
64
P951k
104k
82
P2074k
59k
57
P2893k
182k
143
ATPG padding random-fill 0-fill 1-fill repeat-fill random-fill 0-fill 1-fill repeat-fill random-fill 0-fill 1-fill repeat-fill random-fill 0-fill 1-fill repeat-fill random-fill 0-fill 1-fill repeat-fill random-fill 0-fill 1-fill repeat-fill random-fill 0-fill 1-fill repeat-fill random-fill 0-fill 1-fill repeat-fill random-fill 0-fill 1-fill repeat-fill random-fill 0-fill 1-fill repeat-fill random-fill 0-fill 1-fill repeat-fill
patterns
care bits (%)
551 563 554 516 962 970 955 951 949 961 953 931 785 817 873 747 1,151 3,838 1,307 1,192 2,642 2,486 2,361 2,319 492 673 648 449 615 796 736 602 631 2,099 2,179 834 1,038 1,773 1,604 1,324 2,054 7,780 3,079 2,539
4.3 4.9 4.8 4.7 2.9 3.3 3.2 3.0 2.9 3.3 3.1 3.0 3.2 3.4 3.1 3.7 2.2 1.4 2.3 2.3 1.7 2.0 1.9 1.8 3.2 2.9 3.0 3.8 2.6 2.5 2.6 2.9 1.2 1.1 1.1 1.2 1.8 1.5 1.5 1.6 1.5 1.2 1.4 1.5
∆FC (%) 0.00 -0.03 -0.01 0.00 0.00 0.01 0.00 -0.01 0.00 0.00 -0.01 0.00 0.00 0.04 0.06 0.07 0.00 0.09 -0.04 0.14 0.00 -0.05 -0.05 -0.04 0.00 -0.02 -0.01 -0.01 0.00 -0.01 -0.01 0.00 0.00 -0.01 -0.03 -0.01 0.00 -0.05 -0.05 -0.03 0.00 -0.16 -0.04 -0.02
failure sets (%) 100 90.40 89.14 90.38 100 93.68 86.68 87.73 100 93.93 88.47 88.10 100 90.35 87.60 89.81 100 90.44 85.35 91.37 100 90.32 83.58 86.39 100 85.67 77.42 84.73 100 90.32 86.56 86.84 100 85.56 81.23 83.37 100 92.34 87.57 86.70 100 90.38 84.74 85.79
≥32 detects (%) 63.11 30.48 32.55 49.37 64.02 28.39 30.18 45.73 63.42 27.83 30.46 45.39 63.12 31.14 36.56 47.05 63.92 33.00 37.44 49.07 82.26 35.43 37.06 51.33 72.69 39.78 40.50 54.44 69.31 25.01 29.54 45.01 81.40 26.54 32.66 65.16 66.78 28.55 30.71 42.20 76.39 35.04 39.59 51.63
Table 3: Circuit characteristics
Column 9 reports the percentage of failure sets for which fault effects can be observed on at least 32 outputs, possibly in multiple patterns. Test patterns that detect each stuck-at fault multiple times tend to detect more unmodeled faults (e.g., [23]). A higher number of multiple detects therefore correlates to higher defect coverage. It can be seen from column 9 that the number of failure sets with a high number of multiple detects roughly reduces by 50% when using 0-fill or 1-fill instead of random-fill, and by 30% when using repeat-fill.
Hence, although the same stuck-at fault coverage is achieved with all padding strategies, the actual test quality may be reduced due to lower diagnostic resolution and a lower number of multiple detects. This is particularly the case with 0-fill and 1-fill, and to less extent with repeatfill. The quality of the patterns may be improved by padding some patterns with random-fill and other patterns with repeat-fill, as suggested in [1]. An alternative approach is to use an on-chip LFSR to generate pseudorandom patterns. The LFSR outputs can be modified with
Paper 41.2 1075
deterministic data from the ATE to ensure that all care bits from deterministic ATPG patterns are generated, while the LFSR fills all don’t-care bits pseudo-randomly. The deterministic data on the ATE can still be compressed efficiently by vector repeat per port.
5.2 Vector Repeat Per Port We applied vector repeat per port on the pattern sets as reported in Table 3, assuming a sequencer per pin or per port. The results are reported in Table 4. Column 1 and 2 indicate the circuit name and the padding type. We used the ideal ATE vector memory organization (see Figure 3) in which the sequencer can repeat each individual vector, while the shaded rows (repeat-fill *) correspond to the Agilent 93000 vector memory organization (see Figure 4). Column 3 reports the vector memory usage per pin without compression, i.e. without using vector repeat. The column shows the absolute number of vectors for each circuit with random-fill, which is traditionally the default case. For the other padding types, the number of vectors is given as percentage of this default case. For instance, the randomlyfilled vectors for circuit P141k require 268,823 locations in the vector memory per pin. Only 93.5% of these 268,823 locations are required for storing the vectors with repeat-fill. Columns 4 to 8 show the results when using test vector compression by means of repeat-per-pin-group. In all cases we consider a single port that contains either all scan-in and scan-out pins (si+so pins), all scan-in pins (all si pins), 8 scan-in pins (8 si pins), 4 scan-in pins (4 si pins), or 2 scan-in pins (2 si pins). The columns report the required vector memory per pin as percentage of the memory required for storing the uncompressed randomly-filled vectors (as reported in column 3). For instance, applying vector repeat for circuit P141k on a port with 2 scan-in pins, using the test pattern set with repeat-fill, requires only 3.9% of the 268,823 vectors that are required for storing the randomly-filled vectors without vector repeat. The last rows in Table 4 show the cumulative results for all circuits and correspond to weighted averages. It can be seen that repeat-fill yields the largest reduction, while random-fill gives the lowest reduction. This is in line with the expected, theoretical results in Section 3. For the three largest circuits, the vector memory usage on an ideal ATE is reduced to less than 1% when using repeatper-port on a port with two pins and repeat-fill. The reduction in vector memory usage is larger when the number of pins per port is smaller. No reduction is achieved in column 4 (si+so pins), which is due to the large port-size and the fact that test responses cannot be compressed efficiently. Column 5 (all si pins) corresponds to the traditional approach of vector repeat on all input
Paper 41.2 1076
ATPG padding
uncompr. vectors
random-fill 0-fill P141k 1-fill repeat-fill repeat-fill * random-fill 0-fill P267k 1-fill repeat-fill repeat-fill * random-fill 0-fill P269k 1-fill repeat-fill repeat-fill * random-fill 0-fill P279k 1-fill repeat-fill repeat-fill * random-fill 0-fill P286k 1-fill repeat-fill repeat-fill * random-fill 0-fill P330k 1-fill repeat-fill repeat-fill * random-fill 0-fill P388k 1-fill repeat-fill repeat-fill * random-fill 0-fill P418k 1-fill repeat-fill repeat-fill * random-fill 0-fill P951k 1-fill repeat-fill repeat-fill * random-fill 0-fill P2074k 1-fill repeat-fill repeat-fill * random-fill 0-fill P2893k 1-fill repeat-fill repeat-fill *
268,823 102.2 % 100.4 % 93.5 % 93.5 % 476,684 100.8 % 99.2 % 98.9 % 98.9 % 470,249 101.3 % 100.4 % 98.1 % 98.1 % 327,761 104.1 % 111.2 % 95.0 % 95.0 % 480,383 330.1 % 113.1 % 102.5 % 102.5 % 840,473 94.1 % 89.4 % 87.8 % 87.8 % 269,670 136.7 % 131.6 % 91.3 % 91.3 % 512,511 129.4 % 119.6 % 97.9 % 97.9 % 873,423 332.3 % 344.9 % 132.1 % 132.1 % 1,926,305 170.7 % 154.5 % 127.5 % 127.5 % 4,237,409 372.7 % 149.8 % 123.2 % 123.2 %
random-fill 0-fill 1-fill repeat-fill repeat-fill *
10,683,691 252.4 % 151.4 % 115.2 % 115.2 %
circuit
Σ
vectors (%) with repeat-per-port si+so all si 8 si 4 si 2 si pins pins pins pins pins 100.0 98.9 97.0 91.3 72.7 86.2 46.8 30.6 19.2 7.6 96.8 50.7 35.1 21.8 11.2 82.2 26.3 16.0 9.4 3.9 93.3 63.9 58.2 40.1 16.4 100.0 94.8 79.0 74.2 59.1 100.8 80.8 63.8 57.2 29.7 99.1 79.5 63.0 56.3 28.7 98.8 79.0 61.8 55.7 28.4 98.8 82.5 75.0 69.8 39.4 100.0 94.8 79.0 74.3 59.2 101.2 81.1 64.2 57.5 30.0 100.4 80.4 63.8 57.0 29.2 98.1 78.4 61.4 55.2 28.1 98.1 81.9 74.4 69.3 39.1 100.0 98.8 85.1 76.9 58.8 99.3 54.7 15.9 5.2 2.9 106.6 61.0 17.9 6.9 4.5 90.1 33.0 8.8 3.0 2.0 94.8 71.1 35.3 15.9 10.5 100.0 98.8 85.2 77.1 58.9 297.9 60.2 14.2 7.5 4.8 100.6 50.4 14.4 5.6 3.8 83.8 25.9 6.5 2.4 1.6 102.3 68.8 30.0 13.7 8.9 100.0 100.0 98.8 82.1 29.5 86.3 47.2 29.6 5.0 1.0 87.7 43.5 20.9 7.2 1.7 84.6 24.8 12.6 3.5 0.8 87.8 70.6 52.9 14.4 1.5 100.0 99.8 98.6 81.0 46.5 129.8 78.4 18.6 8.3 2.7 127.2 75.4 17.2 7.7 2.8 83.6 36.6 8.1 3.9 1.1 90.3 72.1 27.1 14.2 4.7 100.0 99.8 98.3 87.4 74.1 119.2 63.9 17.6 9.2 7.4 117.5 63.2 19.0 10.4 8.1 93.3 38.6 8.9 4.9 4.1 97.8 76.1 38.0 23.4 20.3 100.0 99.8 96.0 78.3 51.7 332.3 332.1 322.7 224.1 0.6 344.9 344.8 335.0 232.6 1.1 132.1 132.0 128.2 89.1 0.2 132.1 132.0 129.5 112.0 1.3 100.0 94.5 32.7 32.4 27.0 127.8 42.9 1.7 1.5 0.8 138.4 54.5 2.0 1.5 0.7 101.0 22.5 0.8 0.7 0.4 123.0 68.0 3.8 3.4 2.0 100.0 94.8 91.9 68.4 49.0 316.6 48.0 7.0 3.2 1.9 143.2 47.6 7.5 2.8 1.5 108.3 21.2 3.1 1.3 0.7 118.6 52.2 14.4 6.3 3.5 100.0 219.1 144.8 102.5 112.5
96.3 75.4 77.7 37.8 69.0
81.0 40.7 41.4 19.9 34.2
66.8 47.1 27.0 4.6 27.7 4.6 13.9 3.4 23.2 7.7
Table 4: Vector memory usage with vector repeat per port using ideal ATE architecture (* Agilent 93000)
pins, as used in [1][22]. The data volume is reduced on average to 37.8% on an ideal ATE with repeat-fill, which corresponds to 3X compression. Much better results are achieved when applying vector repeat-per-pin-group with smaller numbers of pins per port, as shown in columns 6 to 8. The required vector memory is reduced on average to 3.4% on an ideal ATE with repeat-fill using two pins per port, which corresponds to 30X compression. The required vector memory is reduced to 4.6% when using either 0-fill or 1-fill, and to 47.1% when using random-fill, which corresponds to a compression ratio of 22X and 2X respectively.
6. Conclusion
The results for the Agilent 93000 tester with repeat-fill, as shown in the shaded rows in Table 4 (repeat fill *), are obtained by using vector repeat instructions only for vectors that can be repeated at least 17 times. It can be seen that the constraints imposed by the vector memory architecture limit the performance of vector repeat per port. The vector memory is reduced on average to 7.7% with repeatfill using two pins per port, which still corresponds to 13X compression.
Our experimental results also show that ATPG repeat-fill causes a slight reduction in test quality. In our future work, we plan to quantify this effect further. We also plan to improve this by using e.g. an on-chip LFSR to fill the don’t-care stimulus bits pseudo-randomly on-chip, while still storing the stimuli on the ATE in a compressed format using ATPG repeat-fill and vector repeat per port.
Table 5 shows the number of sequencer instructions, using repeat-per-pin-group on the Agilent 93000 and the test vectors with repeat-fill. For each port configuration, the total number of instructions in the sequencer memory (total) is shown, as well as the number of vector repeat instructions (repeat). On average 60% of the sequencer instructions are vector repeat instructions. The largest number of sequencer instructions for any circuit in Table 5 is 64,201. This is well below the available sequencer memory on the Agilent 93000, which can store up to 2M sequencer instructions. It can further be seen that the number of sequencer instructions reduces with the number of pins per port. A lower number of pins per port allows for a relatively small number of runs (i.e. sequences of identical vectors) with long run-lengths, which corresponds to a low number of vector repeat instructions with large repeat counts. A higher number of pins per port requires more vector repeat instructions with lower repeat counts. circuit P141k P267k P269k P279k P286k P330k P388k P418k P951k P2074k P2893k
Σ
We showed that vector repeat per port is an efficient and effective method for reducing the data volume of test stimuli. Vector repeat corresponds to run-length encoding of test vectors. We improve the encoding by filling the don’t-care stimulus bits to obtain longer run-lengths. We presented a probabilistic analysis of the performance of vector repeat per port with various ATPG padding strategies. The analysis shows that the best results are achieved with repeat-fill, which outperforms 0-fill, 1-fill, and random-fill. This is confirmed with our experimental results.
We discussed the impact of the ATE architecture on vector repeat. The port concept, a sequencer per port, and the ATE vector memory organization have a large impact on vector repeat. The port concept allows for repeat-per-pin, repeat-per-pin-group, or repeat-per-all-pins. The best results are obtained when using multiple ports and minimizing the number of pins per port. Our experimental results show that ATPG repeat-fill with ATE vector repeat per port, allows for considerable reduction of the test stimuli data volume to be stored on the ATE. On average 30X reduction of the vector memory usage is achieved with an ideal ATE architecture using two pins per port. When considering constraints imposed by a realistic ATE architecture, i.e. the Agilent 93000 SoC test system, on average still 13X reduction is achieved using repeat-fill and two pins per port.
si+so pins total repeat 10 5 10 4 6 2 63 37 64 40 9 3 383 191 23 11 4 1 7,149 3,734 8,975 4,569
all si pins total repeat 3,614 2,136 3,796 1,898 3,715 1,858 5,350 3,235 10,481 6,303 12,463 6,755 2,319 1,368 6,196 3,482 4 1 33,790 22,006 64,201 39,490
8 si pins total repeat 4,834 2,779 5,478 2,740 5,367 2,685 6,089 3,694 9,960 6,149 17,867 9,859 3,553 2,074 10,157 5,961 1,672 835 6,861 4,077 44,453 26,643
4 si pins total repeat 4,679 2,702 9,228 4,617 9,003 4,503 5,386 3,619 7,078 4,818 9,244 6,389 2,457 1,491 8,661 5,242 19,647 9,842 6,645 3,934 25,853 15,488
2 si pins total repeat 2,658 1,609 6,668 3,408 6,586 3,375 3,892 2,520 5,007 3,224 713 405 1,367 899 8,030 4,707 1,564 895 4,432 2,489 15,944 9,730
16,696
145,929
116,291
107,881
56,861
8,597
88,532
67,496
62,645
33,261
Table 5: Sequencer instructions with repeat-per-port using Agilent 93000
Paper 41.2 1077
7. Acknowledgements We acknowledge Hans Dingemanse, Andreas Glowatz, and Michael Wittke at Philips Semiconductors for their contributions to the work described in this paper. We acknowledge Henk Hollmann at Philips Research for referring to the Bell numbers. We acknowledge Erik Jan Marinissen at Philips Research for initiating this work. This research work was partly supported by the German Federal Ministry for Education and Research in the project AZTEKE under contract number 01M3063C.
8. References [1] C. Barnhart et al., OPMISR: The Foundation for Compressed ATPG Vectors, IEEE Proc. Int. Test Conf., 2001, pp. 748-757. [2] I. Bayraktaroglu, A. Orailoglu, Test Volume And Application Time Reduction Through Scan Chain Concealment, ACM/IEEE Proc. Design Automation Conf., 2001, pp. 151-155. [3] T. Bell, Exponential Numbers, Amer. Math. Monthly, Vol. 41, 1934, pp. 411-419. [4] A. Chandra, K. Chakrabarty, Frequency-DirectedRun-Length (FDR) Codes with Application to Systemon-a-Chip Test Data Compression, IEEE Proc. VLSI Test Symp., 2001, pp. 42-47. [5] A. Chandra, K. Chakrabarty, System-on-a-Chip TestData Compression and Decompression Architectures Based on Golomb Codes, IEEE Trans. on ComputerAided Design of Integrated Circuits and Systems, Vol. 20, No. 3, 2001, pp. 355-367. [6] A. Chandra, K. Chakrabarty, Reduction of SOC Test Data Volume, Scan Power and Testing Time Using Alternating Run-length Codes, ACM/IEEE Proc. Design Automation Conf., 2002, pp. 673-678. [7] R. Dorsch, H.-J. Wunderlich, Tailored ATPG for Embedded Testing, IEEE Proc. Int. Test Conf., 2001, pp. 530-537. [8] I. Hamzaoglu, J.H. Patel, Reducing Test Application Time for Full Scan Embedded Cores, IEEE Proc. Int. Symp. on Fault-Tolerant Computing, 1999, pp. 260267. [9] S. Hellebrand et al., Built-in Test for Circuits with Scan Based Reseeding of Multiple-Polynomial Linear Feedback Shift Registers, IEEE Trans. on Computers, Vol. 44, No. 2, 1995, pp. 223-233. [10] S. Hellebrand, A. Würtenberger, Alternating RunLength Coding - A Technique for Improved Test Data Compression, IEEE Test Resource Partitioning Workshop, 2002, pp. 4.3-1 – 4.3.10. [11] H. Ichihara et al., Dynamic Test Compression Using Statistical Coding, IEEE Proc. Asian Test Symp., 2001, pp. 143-148. [12] V. Iyengar, K. Chakrabarty, B.T. Murray, Built-In Self
Paper 41.2 1078
Testing of Sequential Circuits Using Precomputed Test Sets, IEEE Proc. VLSI Test Symp., 1998, pp. 418-423. [13] A. Jas, J. Ghosh-Dastidar, N.A. Touba, Scan Vector Compression/Decompression Using Statistical Coding, IEEE Proc. VLSI Test Symp., 1999, pp. 114-120. [14] A. Jas, B. Pouya, N.A. Touba, Virtual Scan Chains: A Means for Reducing Scan Length in Cores, IEEE Proc. VLSI Test Symp., 2000, pp. 73-78. [15] A. Khoche, J. Rivoir, I/O Bandwidth Bottleneck for Test: Is it Real?, IEEE Test Resource Partitioning Workshop, 2000, pp. 2.3-1 - 2.3-6. [16] A. Khoche et al., Test Vector Compression Using EDA-ATE Synergies, IEEE Proc. VLSI Test Symp., 2002, pp. 97-102. [17] B. Koenemann, LFSR-Coded Test Patterns for Scan Designs, Proc. European Test Conf., VDE Verlag, 1991, pp. 237-242. [18] B. Koenemann et al., A SmartBIST Variant with Guaranteed Encoding, IEEE Proc. Asian Test Symp., 2001, pp. 325-330. [19] C.V. Krishna, A. Jas, N.A. Touba, Test Vector Encoding Using Partial LFSR Reseeding, IEEE Proc. Int. Test Conf., 2001, pp. 885-893. [20] C.V. Krishna, N.A. Touba, Reducing Test Data Volume Using LFSR Reseeding with Seed Compression, IEEE Proc. Int. Test Conf., 2002, pp. 321-330. [21] H.-G. Liang, S. Hellebrand, H.-J. Wunderlich, TwoDimensional Test Data Compression for Scan-Based Deterministic BIST, IEEE Proc. Int. Test Conf., 2001, pp. 894-902. [22] X. Liu et al., Techniques to Reduce Data Volume and Application Time for Transition Test, IEEE Proc. Int. Test Conf., 2002, pp. 983-992. [23] E.J. McCluskey, C.-W. Tseng, Stuck-Fault Tests vs. Actual Defects, IEEE Proc. Int. Test Conf., 2000, pp. 336-343. [24] S. Mitra, K.S. Kim, X-Compact: An Efficient Response Compaction Technique for Test Cost Reduction, IEEE Proc. Int. Test Conf., 2002, pp. 311-320. [25] J. Rajski, J. Tyszer, N. Zacharia, Test Data Compression for Multiple Scan Designs with Boundary Scan, IEEE Trans. on Computers, Vol. 47, No. 11, 1998, pp. 1188-1200. [26] J. Rajski, DFT for High-Quality Low Cost Manufacturing Test, IEEE Proc. Asian Test Symp., 2001, pp. 3-8. [27] J. Rajski et al., Embedded Deterministic Test for Low Cost Manufacturing Test, IEEE Proc. Int. Test Conf., 2002, pp. 301-310. [28] E.H. Volkerink, A. Khoche, S. Mitra, Packet-based Input Test Data Compression Techniques, IEEE Proc. Int. Test Conf., 2002, pp. 154-163. [29] T. Yamaguchi et al., An Efficient Method for Compressing Test Data, IEEE Proc. Int. Test Conf., 1997, pp. 79-88.