Low Power Scan Shift and Capture in the EDT Environment

0 downloads 0 Views 288KB Size Report
INTERNATIONAL TEST CONFERENCE. 2 chain to be driven by a power-aware EDT decompressor, or by a constant value fixed for the entire scan load. The.
Low Power Scan Shift and Capture in the EDT Environment D. Czysz*, M. Kassab, X. Lin, G. Mrugalski, J. Rajski, J. Tyszer* Mentor Graphics Corporation 8005 SW Boeckman Road Wilsonville, OR 97070, USA

Abstract This paper presents a new and comprehensive power-aware test scheme compatible with a test compression environment. The key contribution of the paper is a flexible test application framework that achieves significant reductions in switching activity during all phases of scan test: scan loading, unloading, and capture.

1.

Introduction

On-chip test compression has quickly established itself as one of the mainstream DFT methodologies [37]. It assumes that a tester delivers test patterns in a compressed form, and on-chip decompressors expand them into the actual data loaded into scan chains. Typically, decompressors fill don’t care bits with random values, and therefore the amount of flip-flop toggling during test may result in a power droop condition that would not occur in the chip’s mission mode. This necessitates managing tests such as scan or logic BIST so that they fall within certain power consumption constraints. Otherwise, the power dissipation can increase by a factor of 3 [50] compared to the functional mode of operation, and is only expected to worsen with scan patterns already consuming 30x the mission mode’s peak power [35]. The bulk of test power consumption can also be attributed to unloading test responses with high transition counts. Excessive switching activity in scan chains and other parts of the circuit results in overheating or supply voltage noise, both causing a device malfunction, and thus loss of yield, reliability degradation, shorter product lifetime, or even permanent damage of a circuit. There are many solutions aimed at reducing power dissipation during scan testing in both shift and capture modes. They include test scheduling [10], [19], test vector reordering [12], partitioning [30], [34], [47], [46] and modifications of scan chains [4], [8], [18], [36], transition blocking [17], [25], clock gating [3], [32], [34] as well as test points insertion aimed at keeping the peak shift power below a given threshold [13]. Similar concepts have been proposed for BIST applications [2], [14] - [16], [38], [49], including low-transition test pattern generators [40], [42]. Other techniques tailor test patterns to the requirements of test application with the lowered switching activity. These schemes resolve test power problems by assigning certain non-random values to unspecified positions of test cubes causing power violations such that the number of transitions during scan-in shifting is minimized [39], [41], [43].

*

Poznań University of Technology ul. Piotrowo 3a 60-965 Poznań, Poland

For instance, don’t care bits can be mapped to the value of 0, thus creating strings of 0’s that can be further encoded using run-length codes as shown in [6], [7]. Using an observation that constant fill data tend to reduce test power, a minimum transition fill of [5], [31] replicates a value of the most recent care bit for all unspecified positions until the next care bit whose value will be used for the following positions. Similar concepts are exploited by a vector modification technique [20] and California scan architecture [9]. The minimum transition fill is also deployed in [33] to reassign new values to unspecified positions whose locations are determined by bit-stripping indicating whether turning a bit into unspecified one will affect fault coverage. If so, then the bit is returned to its original value, otherwise it is kept as a don’t care. Other forms of filling [28], [44] - [46] are aimed at decreasing capture power dissipation. They are guided by rules of suitability, which provide values that should be assigned to unspecified bits so that the number of transitions at the outputs of scan cells in the capture mode is minimized. Other techniques proposed to address the low capture power include [21] and [30]. Low power test data encoding schemes [22], [23], [29] adopt conventional LFSR reseeding techniques to reduce the scan-in transition probability. In particular, the method of [29] uses two LFSRs to produce actual test cubes and the corresponding mask bits. Outputs of both LFSRs are AND-ed or OR-ed to decrease the amount of switching. The use of extra seeds may compromise compression ratios, and therefore the scheme of [22] divides test cubes into blocks and only uses reseeding to encode blocks that contain transitions. Other blocks are replaced with a constant value which is fed directly into scan chains. Some of the above schemes such as design partitioning and scheduling can reduce global power during both shift and capture. Even if they are employed, power reduction is often still necessary for the design partitions being tested in a given session. Thus, a method is required that is nonintrusive to the design and design flow; can control power during scan loading, unloading, and capture; and supports embedded compression with high compression ratios. In this paper, we describe a comprehensive, compressionfriendly scheme aimed at reducing switching activity during all scan test phases: scan loading, unloading, and capture. It employs a controller that either allows a given scan

Paper 13.2 INTERNATIONAL TEST CONFERENCE 1-4244-4203-0/08/$20.00 © 2008 IEEE

1

2250

1750 1500 1250 1000 750 500 250

Paper 13.2

Figure 1

1125

937

750

562

375

Scan chains with specified bits

Another interesting factor is the number of scan chains used to capture and observe fault effects. Fig. 2 illustrates how many test patterns have their fault effects observed by a given number of scan chains in another circuit having 0.9M gates and 45K scan cells forming 100 scan chains. The test patterns applied here were generated such that half of the scan chains are driven by a decompressor, while the remaining chains receive a constant 0. In principle, all scan chains are designated as potential observation points during fault simulation. As can be seen, only a small fraction of scan chains is actually used to observe additional detected faults. Similar results were obtained for other industrial designs. It creates a ground rule for reduction of power expended during capture and unloading of test responses as there is no need to store test results in all scan chains. As shown in the following sections, one can take advantage of this observation by selective usage of either scan enable circuitry or respective clock trees. 900 800

# Test patterns

700 600 500 400 300 200 100

Figure 2

3.

81

61

41

0 21

• dynamic compaction is carried out only if a test cube has less than 20% of scan chains with specified bits, • half of scan chains with no specified bits are filled with a constant value (in the experiments these chains are selected randomly), • logic values for the remaining scan chains are determined by the on-chip decompression logic. Fig. 1 shows how many test patterns, after dynamic compaction, have their specified locations confined to a given number of scan chains. These figures were obtained for an industrial design comprising 2.5M gates and 143K scan cells forming 1200 scan chains. As can be seen, the vast majority of test cubes feature specified bits in less than 20% of the scan chains. At the same time, limiting the number of scan chains with specified bits has a moderate impact on a pattern count. For the same design, the pattern count increase is 3.4% compared to a test generation

187

0

Motivation

Patterns generated for scan-based tests have traditionally tended to ignore power consumption as their primary objective was to maximize the fault coverage regardless of switching activity. Moreover, the same generation process was aimed at reducing the number of vectors and thereby reducing the ATE time. Typically, ATPG uses dynamic compaction to reduce a pattern count. Starting from a test cube generated for a primary target fault, dynamic compaction expands this cube to cover other faults by assigning appropriate values to unspecified positions. Thus, the number of scan chains having specified bits gradually increases as dynamic compaction progresses. With the concern over test power dissipation, however, one needs to minimize the number of scan chains with specified bits while also minimizing a possible impact on the pattern count. As a result, scan chains with very low or zero fill rates may serve as a baseline in reducing scan shift power. Consider, as an example, results of experiments aimed at analyzing the number of scan chains populated by specified bits, and impact this quantity may have on a pattern count. A conventional EDT [26] was used as the experimental platform. In particular, we try to limit the number of scan chains that feature specified cells as follows:

# Test patterns

2000

1

2.

scheme that has no restrictions as far as scan chains hosting specified bits are concerned. It is important to note that there is no fault coverage loss reported due to this technique. Much the same can be said about three other industrial designs examined in similar experiments.

0

chain to be driven by a power-aware EDT decompressor, or by a constant value fixed for the entire scan load. The controller’s configuration is held constant for all cycles of a given scan load but is reloaded for every pattern with minimal control data. The same controller may be used to decide which scan chains remain in a shift mode during the capture cycle by judiciously disabling scan clocks. This is affective at reducing toggling during capture and unloading, especially when the scan chains kept in the shift mode loaded constant values. An alternate method to reduce switching during capture uses ATPG to turn off clock gaters unnecessary to target certain faults. This approach allows ATPG to prevent a large number of flip-flops from changing their states with relatively few specified bits, and with no modifications to the design.

Scan chains observing faults

Low power test architecture

Observations presented in the previous section clearly indicate that on-chip test data decompression can be taken to a new level by appropriately engineering the low power scan test application. Since deterministic test vectors have typically only a small fraction of bits specified, the remaining scan cells can be either randomly filled with 0s and 1s, or they may assume other forms of filling without com-

INTERNATIONAL TEST CONFERENCE

2

promising test coverage. Furthermore, as shown earlier, very often locations of specified bits can be confined to a few scan chains. Consequently, we can feed these scan chains directly from a test data decompressor while replacing all don’t care positions in the remaining chains with a constant value. Such an approach may significantly reduce the number of transitions during scan-in shifting.

tent of a scan chain at speed, the result can be deemed unpredictable. Assuring reliable values in such scan chains requires, therefore, masking out many scan cells, which in turn can severely affect compression. A way around this problem is to tie the control of the stimuli with that of scan enable such that only scan chains which load constant values are kept in the shift mode during capture.

A solution implementing the above idea is shown in Fig. 3. Input channels provide compressed test patterns to the decompressor. The same channels can be used to deliver information to a control block. The control block comprises a control register and combinational XOR logic driven by data stored, in a compressed form, in the register. The control block outputs gating signals to AND (alternatively OR) gates such that indicated scan chains can be either connected with the decompressor or can be fed by a constant value of 0 (1) on a per pattern basis. Further details regarding the low power scan shift-in operations are presented in Sections 4 and 5.

This is the principle behind a solution shown in Fig. 3 (note that the Test Mode input is set to 1 for the entire test session). We assume that the same controller is used for constant stimuli and scan enable. As a result, only one solver is needed since the values of the stimuli and scan enable controls are now tied to each other. If a scan chain loads constant values, then its scan enable will be kept high during the capture cycle, forcing low toggling capture and scan shift-out operations. For a scan chain which must be loaded by the decompressor (de-asserted scan enable during capture), the corresponding low power controller output is 1, allowing the input Scan Enable to take over.

SE



Decompressor

SE

Note that it should be possible to partly separate controls such that every scan chain with a scan enable asserted during capture is loaded with constant values while those with scan enable de-asserted during capture can either load constant values or load values from the decompressor. This allows improved power reduction of the stimuli if one does not have sufficient granularity to control scan enable. In other words, some scan chains would load constant values even though those values may change during capture.

4.

SE

… XOR network Control register Scan Enable

Figure 3

Test Mode

Scan-based low power scheme

The bulk of power dissipation is attributed to responses that usually exhibit high transition counts. As shown earlier, reduction of capture and shift-out power can be achieved due to error propagation sites that are typically confined to a few scan chains, only. Thus, one can avoid capturing data that would overwrite the otherwise constant values being scanned in. Some extra safety precautions can be essential, however, when relying on this concept. Although it may be safe, during capture, to maintain an asserted scan enable signal for a scan chain that loaded constant values, it is not safe to do so for a scan chain that loaded non-constant values from the decompressor. This is because scan paths are almost never timed to operate at functional frequencies. So if one shifts a non-constant con-

Paper 13.2

Scan shift-in operations

The combinational part of the control block is designed as an m-input n-output XOR mapping circuit, where m is the number of control bits, and n is the number of scan chains. In other words, each output of the block is obtained by XOR-ing certain control bits. The XOR network is configured in such a way that it guarantees high encoding efficiency, i.e., a close to 100% ratio of successfully encoded pre-specified gating signals to the number of control bits. The control circuitry operates as follows. For all scan chains having specified bits, the corresponding gating signals should be chosen in such a way that the AND gate have 1 on its input. Consequently, these scan chains can be driven directly from the decompressor. This process continues until pre-specified power limits are reached. As for the remaining scan chains, one can determine, using the XOR network and the control bits found earlier, those gating signals that were not encoded. The gating circuitry of Fig. 3 allows the decompressor to drive approximately 50% of such scan chains, while the remaining ones receive a constant 0. Indeed, a single output of the XOR network is producing 0 with probability 0.5 (except rare cases when the number of asserted control variables is very small), and thus the gating signal reaches the AND gate and then a scan chain as a constant 0. Note that several experimental results [5] indicate lower switching activity when the scan chains are loaded with the constant 0 rather than the con-

INTERNATIONAL TEST CONFERENCE

3

SE

SE



Two constant stimuli. In certain applications it might be desirable or convenient to feed scan chains with both logic values of 0 and 1 acting as constant stimuli. Consider scan chains that were not the subject of encoding, as discussed in the previous paragraph. The gating circuitry shown in Fig. 4 allows the decompressor to drive approximately 25% of such scan chains, while 50% of them receive a constant 0, and 25% have a constant 1. In other words, roughly 75% of scan chains with no specified bits will have no transitions at all. Indeed, as shown earlier, a gating signal reaches the AND gate and then a scan chain as a constant 0 with probability 0.5. On the other hand, in order to get a constant 1, two gating signals are needed - one set to 0 and driving the OR gate, and the other one set to 1 and feeding the AND gate. Clearly, the XOR network outputs this combination with probability 0.25.

depicted in Fig. 5. It may also consist of multiple modules with different probabilities of forcing constant stimuli. A programmable controller is then used to select one of the biasing circuits. Depending on the implementation, the reconfigurable biasing circuit can be used in two modes: • the selected biasing circuit remains unchanged during the whole scan shift period, • various biasing circuits are chosen for different scan chain segments during scan shift.



Decompressor

stant 1. Even if this is circuit-specific feature, it nevertheless appears to be the case across several designs.

SE



Scan enable control

SE

XOR network Control register

… Scan Enable

Test Mode

XOR network

Figure 4

Producing constants 0 and 1

The fraction of scan chains driven by the decompressor can be changed in the manner of Fig. 5 by adding a biasing circuitry, i.e., the group of AND gates driven by the XOR network. With these gates in place, the decompressor is capable of driving approximately 25% of scan chains. This percentage can be reduced even further by adding more inputs to the AND gates. For example, the third input reduces the percentage of scan chains driven by the decompressor down to 12.5%, while the fraction of scan chains getting the constant value of 0 increases accordingly, as shown earlier. Reduction of control data volume. When generating test patterns, it is possible to make different test patterns sharing the same control data such that the control data can be loaded into the programmable controller only once for multiple test patterns and only the unique control data is stored in the external tester. To maximize the control data to be shared with different test patterns, the ATPG can be enhanced to prefer the control data that meets current requirement and is used by the test patterns generated before when there are multiple choices. Reconfigurable biasing logic. To further enhance ability of a biasing circuit to reduce the switching activity during scan shift, it can be designed as a reconfigurable device. In its simplest form, it can allow a programmable selection of the XOR logic outputs employed to drive AND gates, as

Paper 13.2

Control register Scan Enable

Figure 5

5.

Test Mode

Biasing circuitry

Enhanced scan shift-in

The low fill rates make it also possible to deliver the identical test data to scan chains for a number of shift cycles directly from the decompressor, thereby further reducing the total number of transitions. In order to implement this idea, however, a mechanism is required to sustain the outputs of a decompressor for more than a single clock cycle, while allowing the decompressor to change its internal state to ensure successful encoding of next specified bits. Our low power solution assumes that the scheme presented in the previous section is integrated with the on-chip test data decompressor as shown in Fig. 6 (its stand-alone version is presented in [11]). The same data are now provided to the scan chains for a number of shift cycles through a shadow register placed between the ring generator and the phase shifter. It captures and saves, for a number of cycles, a desired state of the ring generator, while the generator itself keeps advancing to the next state needed to encode another group of specified bits. As a result, independent operations of the ring generator and its shadow register allow virtually any state which causes no conflicts with specified bits to reduce a transition count. A single input channel suffices to facilitate the operation of

INTERNATIONAL TEST CONFERENCE

4

the shadow register. Every shift cycle, a control bit is sent to the decompressor in order to indicate whether the shadow register should be reloaded with the current content of the ring generator. If a given control bit is set to 1, then the shadow register updates its state before the ring generator reaches its next state. Instead of the control channel, one can use the decompressor input channels to deliver the control information merged with the seed variables, as shown in Fig. 6. All inputs drive an XOR tree which basically computes a parity signal for input variables injected during a given shift cycle. If the parity of inputs is odd, then the shadow register is reloaded before new seed variables enter the ring generator. Otherwise, the content of the register remains unchanged. As a result, in addition to silent scan chains loading constant stimuli, one can reduce the degree of toggling in those scan chains that are driven directly by the decompressor, thus reducing the total switching activity even further.

Phase shifter

Shadow register

Power reduction circuitry

+

SE



Ring generator

SE

SE

Control data

Figure 6

6.

Shadow register in decompressor

Capture and scan shift-out

Consider now scan operations during capture of test responses and shifting them out of the scan chains. Figures 3, 4, and 5 show the solution in which the same XOR network as before acts as a capture control circuitry that gates the scan enable signals. In Fig. 3, for instance, once all control variables are known, status of those scan chains that were not the subject of encoding will be determined such that approximately 50% of these scan chains may remain in the scan shift mode, that is, they do not record test results since the scan enable signal is asserted, while the remaining scan chains capture actual test responses. As a result, the power dissipation during capture cycles can be significantly reduced. Furthermore, many of the scan chains that do not capture test responses maintain their low toggle test patterns loaded earlier. Consequently, when the entire circuit is placed in the scan mode again, that preserved scan content facilitates suppression of flip-flop tog-

Paper 13.2

gling during shift out operations. It is worth noting that selecting different constant values loaded into the scan chains fed by the constant test stimulus may have different impact on the switching activity during capture. Preferred fill is proposed in [28]. It uses signal probability to determine the filling value during test pattern generation in order to reduce the switching activity during capture. Assuming that the scheme of Fig. 4 is employed, this strategy can be adopted here to select the constant stimulus in such a way that the constant 0 is chosen anytime the number of scan cells with the preferred value 0 is greater than or equal to the number of scan cells with the preferred value 1. Clock gating. Asserting separate scan enable signals during the capture cycles for each scan chain or group of scan chains that load constant values may not be practical in some design flows. An alternative solution with no impact on the design flow is to leverage the embedded clock gaters already available in many designs to reduce the power consumption. The clock gaters are simple devices inserted between a clock line and the corresponding state elements. If set properly, they prevent clock pulses from reaching selected flip-flops. As a result, the capture power can be reduced by forcing those memory elements to hold their states. In a mission mode, the clock gaters are controlled by functional enable logic fed by scan cells. To support scan shift operations, clock gaters are forced on when a test mode or scan enable are asserted. It is worth noting that ATPG can justify the functional enable logic to its off-state, thus preventing many downstream flip-flops from toggling. This can be accomplished by using a relatively small amount of test data provided by scan chains. In other words, during capture, the clock gaters are controlled by scan values loaded through the EDT decompressor. Clearly, the primary objective is to meet the power constraints by disabling flip-flops with as few additional specified bits as possible to minimize impact on compression. Furthermore, many designs may feature a hierarchy of clock gaters. If all flip-flops in a specific area of the design do not capture any fault effects, a higher-level clock gater may be turned off by using fewer specified bits. The method for enhancing ATPG to reduce capture power by controlling clock gaters is described in the next section.

7.

Encoding algorithm and ATPG flow

Loading scan chains with constant values. It is worth noting that the actual number of scan chains that can be driven by the decompressor depends on the encoding capabilities of the XOR network and the control register. Since the encoding process is equivalent to solving a set of linear equations, setting a gating signal to a pre-specified value requires one, two, or three control bits (variables). For instance, allowing the decompressor to drive a given scan chain with specified bits requires solving two equations (one per each gating signal) for the circuit of Fig. 4. Note

INTERNATIONAL TEST CONFERENCE

5

that in this particular case, an encoding failure is not possible, as a request to drive too many scan chains may only result in setting all variables to the value of 1. Assuming that the XOR network is comprised of 3-input XOR gates, loading the control register with all-1 pattern will result in all scan chains driven by the decompressor. This feature can also be used to turn off the low power decompression circuitry. The gating circuitry shown in Fig. 3 and Fig. 5 needs one and two equations per scan chains, respectively. If one decides to combine the solution of Fig. 4 with an extra biasing logic of Fig. 5, then three equations are required to determine the gating status of a single scan chain. Following is a simplified test pattern generation flow for one test pattern that incorporates constant stimuli and scan enable control: generate and compress test cube for one fault; give solver list of chains with specified bits or observation points; solve ignoring power constraints if necessary; while (abort and power limits not reached) { if (any chain must receive constant 0’s for power constraints) don’t justify 1 or observe fault on it; generate and merge test for another fault; give low power solver list of chains driven by decompressor; if (low power solver succeeded) { compress new combined test cube; if (decompressor solver fails) backtrack both solvers; } else backtrack; } add and try to solve equations xi = 0 for all i to possibly reduce the number of control variables set to 1;

Encoding procedure for the shadow register. In order to further reduce switching activity in the scan chains driven directly by the decompressor, an appropriate compression procedure is used in conjunction with the circuitry of Fig. 6. In principle, it works as follows [11]. It partitions a given test cube into blocks comprising certain number of consecutive time frames such that there are no specified bits of the opposite values in a given scan chain inside the blocks. This feature allows one to repeat a given decompressor state many times in succession by using the shadow register storing a state that the ring generator entered at the beginning of a block. The actual block size is determined by the ability to encode some of the specified bits occurring within boundaries of the block. The encoding process begins with a block and the corresponding state of the ring generator which should be applied first,

and it gradually moves towards the other end of the scan chains. As long as the solver can compress the test cube, the algorithm works by repeatedly increasing the size of the block, adding new equations, and invoking the solver again. Clearly, at some point a solution may not exist anymore. This particular time frame is then assigned a new block, and the procedure continues. There are certain particulars that distinguish the low power compression from the conventional EDT. Although it is still performed by treating external test data as Boolean variables, and by conceptually filling all scan cells with linear functions of input variables, it handles specified bits differently. Typically, linear equations correspond directly to specified bits occurring in successive slices of a test cube. Now, only one bit chosen from a set of specified locations having the same value and occurring in the same scan chain of the same block becomes the subject of encoding as the remaining ones are automatically covered by the content of the shadow register. Moreover, all equations are expressed in terms of variables injected until beginning of a given block. This is because a ring generator state which is to cover the indicated block has to be completely determined before it is moved to the shadow register during a next cycle, i.e., when the decompressor starts feeding the scan chains. Hence, designated specified bits are conceptually moved to the beginning of the block, in order to redefine their equations. Finally, one extra equation per time frame is always added to represent a request to reload the shadow register with the content of the ring generator, if necessary, before new variables will change the state of the generator (compare Fig. 6). Controlling clock gaters. First, we carry out a structural analysis to identify all the clock gaters for each available clock. The identified clock gaters are then targeted by ATPG, prior to pattern generation, to obtain clock control cubes (CCC) that disable the targeted clock gater. Due to possible overlap in the control logic of clock gaters, a CCC may additionally enable or disable other clock gaters. Once the CCCs for a clock are generated, the cubes are ordered by the decreasing number of state elements whose clocks are turned off by each cube. After dynamic compaction, compatible CCCs are merged in the aforementioned order so as to maximize gating off the flip-flops that do not need to be clocked, while the number of additional specified bits that must be provided by scan is minimized. Note

Table I. Circuit characteristics

Paper 13.2

Design

Gates

Scan chains (no × size)

Channels

D1 D2 D3 D4

470K 1.3M 2.0M 2.8M

164 × 169 203 × 300 832 × 158 321 × 258

4 4 8 4

Load WTM

Capture WSA

Unload WTM

Peak WSA (%)

49.26 49.63 48.90 49.46

13.89 19.87 31.14 7.25

41.34 51.92 38.67 48.01

23.44 37.82 41.44 13.35

EDT switching activity (%)

INTERNATIONAL TEST CONFERENCE

6

that during dynamic compaction, it is necessary to routinely check that there remains enough encoding capacity to merge CCCs such as to meet the power constraints. Further capture power reduction. For some designs, clock gating may provide insufficient power reduction if the design’s clock gating architecture do not provide sufficient control granularity, or if the clocks of many flip-flops cannot be gated off. In such cases, the ATPG process can be augmented by traditional ATPG power reduction methods such as preferred fill [28]. Such methods typically require a large number of specified bits and are therefore not compatible with embedded compression. However, by identifying more critical flip-flops whose toggling leads to greater switching activity, any excess encoding capacity can be used to ensure that those critical state elements retain their states.

8.

Experimental results

The proposed low power schemes were tested on several industrial designs. This section reports results for some of them, ranging in size from 470K to 2.8M gates. The basic data regarding the designs, including the number of gates, the scan configuration, and the number of compressed channels are listed in the left part of Table I. Toggle rates. The primary objective of the experiments is to measure the amount of toggling, and to compare the switching activity when applying test patterns produced both in a conventional EDT environment and by using the low power scheme. Scan shift operations dissipate power which depends directly on the number of transitions that occur in the scan chains and other parts of the CUT. The resultant switching activity is estimated by the weighted transition metric (WTM) that counts the number of invoked transitions in successive scan cells, while taking into account their relative positions. Let m be the length of a scan chain, and T = b1b2 … bm represent a test vector with bit bk scanned in before bk+1. The normalized form of the metric is then defined as follows:

S = 2[m( m + 1)]

−1

m −1

∑ (m − i)(b ⊕ b i

i =1

i +1

)

The average scan power dissipated during test application is obtained by summing up results provided by the above formula over all scan chains and all test patterns. The same rules apply to unload transitions observed when scanning out test responses. To measure toggling in the capture mode, the switching activity at each gate in the circuit is used. The weighted switching activity (WSA) of a test pattern p is defined as follows: f +1 g

W p = C −1 ∑∑ t k Fk i = 2 k =1

where tk = 1 if gate k toggles in frame i, and tk = 0 otherwise, Fk is the fan-out of gate k, g is the total number of gates, f is the number of frames in test pattern p (f + 1

Paper 13.2

represents the post capture frame), and C is the number of cycles of test pattern p. As before, the final results are obtained by summing the above formula over all test patterns and reporting its normalized value with respect to the worst case scenario, i.e., when all gates toggle. All experiments reported in the paper were performed for both stuck-at tests and launch-off-capture transition tests. For a given design, the type of patterns generated had little impact on the reported power reduction. Table II. Clock gater control statistics Design

Bits/cube

Gated FFs

FFs/bit

%FFs with gated clocks

D2 D1 D3 D4

7.6 7.6 15.6 5.1

942 501 4433 553

124 65 284 107

61% 70% 91% 52%

Average

9.0

1607

145

68%

Clock gater control. The motivation to have ATPG control clock gaters was to prevent a relatively large number of flip-flops from toggling using relatively few specified bits, which is a requirement for embedded compression. Table II quantifies those benefits. Since each CCC may affect multiple clock gaters, the statistics center on the CCCs rather than the individual clock gaters. The second column reports the average number of specified bits per CCC that must be provided by scan. This represents the average encoding capacity cost per CCC. The third column contains the average number of flip-flops whose clocks are shut off by one CCC. This provides the average benefit. The fourth column has the ratio of those two numbers: the average number of flip-flops turned off per specified bit. Finally, the last column contains the percentage of all flip-flops in the design whose clocks may be gated off. Obviously, if this percentage is not high, ATPG will be restricted in how much switching reduction is possible by this method alone. The design trend is towards greater use of clock gating especially for mobile devices or those where thermal dissipation is a concern. The first group of experiments addresses the load scan shift switching when a constant value of zero in conjunction with the decompressor shadow register are used to fill the scan chains. Moreover, a biasing circuitry of Fig. 5 is deployed to help maintaining the amount of scan chains driven by the decompressor at a desired level. Results shown in Table III provide the following information for each test case: • low-power decompressor method(s) used: constant loading of scan chains (CL) and/or holding the decompressor output constant for multiple cycles (OH), • the size of the control register, • the switching rates for scan load, capture, and scan unload modes, respectively; note that reducing load switching has a positive impact on switching activity

INTERNATIONAL TEST CONFERENCE

7

Table III. Experimental results – filling scan chains with constant values Design

Ctrl reg.

Method CL OH CL+OH CL OH CL+OH CL OH CL+OH CL OH CL+OH

D1

D2

D3

D4

48

64

256

96

Low power switching activity (%) Reduction Capture Reduction Unload

Load 15.87 10.51 2.03 11.34 13.20 3.69 11.04 21.31 6.37 13.68 12.63 3.59

67.78 78.66 95.88 77.15 73.40 92.56 77.42 56.42 86.97 72.34 74.46 92.74

9.97 13.10 9.57 20.05 19.04 19.92 19.64 30.24 20.11 3.92 7.10 4.00

28.22 5.69 31.10 -0.91 4.18 -0.25 36.39 2.89 35.42 45.93 2.07 44.83

during capture and unloading of scan chains - this is why these two corresponding figures of merit are included in the table, • a switching reduction factor, i.e., the ratio of the standard EDT switching (presented earlier, for each design, in Table I) to the low power switching, • the peak capture cycle switching activity across all test patterns. In all experiments, the threshold was set such that when possible, at least 70% of the scan chains will be loaded with constant values. As can be seen, there is a significant reduction in the switching activity during scan loading and unloading. In addition, while not explicitly targeted by this method, there Table IV. Experimental results – clock gaters Design D1 D2 D3 D4

Switching (%) Capture

Reduction

Peak (%)

ΔPC (%)

ΔBCE (%)

12.72 16.08 31.26 4.58

8.42 19.07 57.42 36.83

22.61 38.00 41.54 10.15

-11.86 23.13 15.96 -7.95

-0.13 -0.04 -0.28 -1.16

13.67 15.12 5.16 29.14 25.30 23.71 11.95 24.70 9.77 14.56 20.84 6.89

Reduction

Peak (%)

ΔPC (%)

ΔBCE (%)

66.93 63.43 87.52 43.88 51.27 54.33 69.10 36.13 74.73 69.67 56.59 85.65

22.93 22.20 20.98 27.29 27.84 27.19 32.14 41.66 32.21 8.51 13.28 9.17

34.47 -16.45 30.13 33.46 -0.17 36.38 12.03 -7.15 4.92 20.95 2.93 16.32

-0.20 -0.39 -0.63 -0.66 -0.22 -0.85 -0.85 -0.15 -0.91 -0.93 -0.43 -1.31

is often a reduction in the capture-cycle switching as well. Table IV reports the individual impact of using the clock gater control method on the capture switching activity. The threshold was set such that when possible, no more than 25% of the flip-flops that are driven by clock gaters are allowed to be clocked. The amount of power reduction possible depends on various factors including the percentage of flip-flops that may be gated off, and how many flipflops can be gated off with limited cost (encoding capacity). For designs like D3, where both criteria are wellsatisfied, significant power reduction is possible even through the design is set up for over 100x compression and therefore has limited encoding capacity. In addition to the reduction in average switching activity, the peak switching is often visibly reduced as well. In some cases, however, even though the average switching is significantly reduced, the peak switching is not because there are a few tests that necessarily generate significant switching activity. Consider for example the stuck-at-0 fault on the global signal that forces all clock gaters on. The pattern count in most cases is close to that of unconstrained ATPG. Finally, both methods are jointly used in experiments reported in Table V, where the combined impact of both low power scan shifting and constrained capture on toggling rates is summarized. When the methods are combined, it is

Table V. Experimental results – filling scan chains combined with clock gating Low power switching activity (%)

Design

Control register

Load

D1 D2 D3 D4

48 64 256 96

2.41 5.26 6.58 3.53

Paper 13.2

Reduction Capture 95.11 89.40 86.54 92.86

8.73 18.67 9.31 3.53

Reduction

Unload

Reduction

37.15 6.04 70.10 51.31

5.28 21.46 8.43 6.61

87.23 58.67 78.20 86.23

Peak (%) 18.43 26.57 31.76 9.13

INTERNATIONAL TEST CONFERENCE

ΔPC (%) ΔBCE (%) 38.46 37.78 38.70 29.59

-0.55 -0.84 -1.01 -1.94

8

important not to over-constrain the number of scan chains that may load non-constant values. Such constraints limit how many clock gater control cubes may be merged, which is why for some cases, the capture-cycle power reduction using only the clock gater control method exceeds when clock gater control is combined with the low-power decompressor. It is worth noting that in all experiments fault coverage is always given the highest priority. Therefore, even if a test for a single fault exceeds a threshold, such a test pattern is accepted. Having test patterns exceeding the power constraints is more common for tight thresholds. With the smaller number of low power scan chains, all test patterns typically meet the power constraints. Impact on pattern count. Application of tests in the low power mode reduces the degree of randomness observed in the scan chains. Additionally, restricting the number of scan chains that ATPG may assign values to or pulse to observe fault effects restricts dynamic compaction and therefore usually leads to higher pattern counts even though the schemes are designed to reduce the impact on compaction. Similarly, a larger control register offers ATPG greater flexibility and may therefore reduce the pattern count impact. For example, if the size of the control register used in design D3 is decreased by 50%, the power reduction is similar but the pattern count increases by 39%. In Tables III - V, the test pattern difference between unconstrained EDT and EDT with the low power scheme is represented by ΔPC. The pattern count increase varies per design and reflects the extent to which the low power scheme constrains dynamic compaction. One method for reducing pattern count impact to be explored is to generate patterns using unconstrained ATPG, filter out the patterns that exceed power thresholds, then use poweraware ATPG to incrementally generate patterns for the faults exclusively tested by the rejected patterns. Furthermore, employing the same experimental setup, we have used the bridging coverage estimate [1] to assess the impact of original EDT test patterns and low power stimuli on bridging defects. This metric is derived from data indicating how many times each stuck-at fault is detected. Data collected in the column ΔBCE of Tables III, IV and V indicate that the negative impact of low power test patterns on the bridging coverage is negligible. Impact on run time. The increase in ATPG run time is typically less than 15% for each individual method, and less than 20% when combined. This is when the pattern count is relatively unchanged. Of course, if more patterns are generated, the run time increases proportionally.

9.

Conclusions

In this paper, we propose a comprehensive scan test application scheme encompassing efficient techniques to reduce power dissipation in the continuous flow EDT environment during scan loading, unloading, and capture. Experi-

Paper 13.2

mental results confirm that for industrial designs, our scheme results in substantial reduction of test power. Switching activity during scan loading and unloading is reduced by up to 96% (24x) and 87% (8x), respectively, using the proposed low-power decompressor. Average switching during capture is reduced by up to 70% using a combination of low-power decompressor and clock gater control. Using only clock-gater control, switching is reduced by up to 57%. The peak capture switching is reduced by up to 32%. Such significant power reductions create a margin that allows accelerating scan shifting which, in turn, effectively decreases test application time. It also enables more of the design to be tested in a given session, enabling a reduction in test time as well. The same methodology allows one to operate within specified power consumption budgets without compromising test quality. Finally, the presented approach fits into the core-based design paradigm, and complements design partitioning schemes for power reduction.

10.

Acknowledgment

The work done by Dariusz Czysz has been supported by a doctoral scholarship provided by Mentor Graphics Corporation, Wilsonville, OR.

11.

References

[1] B. Benware, C. Schuermyer, S. Ranganathan, R. Madge, P. Krishnamurthy, N. Tamarapalli, K.-H. Tsai, and J. Rajski, “Impact of multiple-detect test patterns on product quality,” Proc. ITC, pp. 1031-1040, 2003. [2] S. Bhunia, H. Mahmoodi, D. Ghosh, S. Mukhopadhyay, and K. Roy, “Low-power scan design using first-level supply gating,” IEEE Trans. VLSI Systems, vol. 13, pp. 384-395, March 2005. [3] Y. Bonhomme, P. Girard, L. Guiller, C. Landrault, and S. Pravossoudovitch, “A gated clock scheme for low power scan testing of logic ICs or embedded cores,” Proc. ATS, pp. 253-258, 2001. [4] Y. Bonhomme, P. Girard, L. Guiller, C. Landrault, and S. Pravossoudovitch, “Efficient scan chain design for power minimization during scan testing under routing constraint,” Proc. ITC, pp. 488-493, 2003. [5] K. M. Butler, J. Saxena, T. Fryars, G. Hetherington, A. Jain, and J. Lewis, “Minimizing power consumption in scan testing: pattern generation and DFT techniques,” Proc. ITC, pp. 355-364, 2004. [6] A. Chandra and K. Chakrabarty, “Low-power scan testing and test data compression for system-on-a-chip,” IEEE Trans. CAD, vol. 21, pp. 597-604, May 2002. [7] A. Chandra and K. Chakrabarty, “A unified approach to reduce SOC test data volume, scan power and testing time,” IEEE Trans. CAD, vol. 22, pp. 352-362, March 2003. [8] M. Chiu and J. C.-M. Li, “Jump scan: a DFT technique for low power testing,” Proc. VTS, pp. 277-282, 2005. [9] K.Y. Cho, S. Mitra, and E.J. McCluskey, “California scan architecture for high quality and low power testing,” Proc. ITC, paper 25.3, 2007. [10] R. M. Chou, K. K. Saluja, and V. D.Agrawal, “Scheduling tests of VLSI systems under power constraints,” IEEE Trans. VLSI, vol. 5, pp. 175-185, June 1997. [11] D. Czysz, G. Mrugalski, J. Rajski, and J. Tyszer, “New test

INTERNATIONAL TEST CONFERENCE

9

[12]

[13] [14] [15]

[16] [17]

[18] [19] [20] [21] [22] [23] [24] [25]

[26] [27] [28]

[29]

[30]

data decompressor for low power applications,” Proc. DAC, pp. 539-544, 2007. V.P. Dabholkar, S. Chakravarty, I. Pomeranz, and S.M. Reddy, “Techniques for minimizing power dissipation in scan and combinational circuits during test application,” IEEE Trans. CAD, vol. 17, pp. 1325-1333, Dec. 1998. M. ElShoukry, C.P. Ravikumar, and M. Tehranipoor, “Partial gating optimization for power reduction during test application,” Proc. ATS, pp. 242-247, 2005. S. Gerstendorfen and H.-J. Wunderlich, “Minimized power consumption for scan-based BIST,” Proc. ITC, pp. 77-84, 1999. P. Girard, C. Landrault, S. Pravossoudovitch, A. Virazel, and H.-J. Wunderlich, “High defect coverage with lowpower test sequences in a BIST environment,” IEEE Design & Test of Computers, vol. 19, pp. 44-52, Sept.-Oct. 2002. A. Hertwig and H.-J. Wunderlich, “Low power serial builtin self-test,” Proc. ETW, pp. 49-53, 1998. T.-C. Huang and K.-J. Lee, “Reduction of power consumption in scan-based circuits during test application by an input control technique,” IEEE Trans. CAD, vol. 20, pp. 911-917, July 2001. T.-C. Huang and K.-J. Lee, “A token scan architecture for low power testing,” Proc. ITC, pp. 660-669, 2001. V. Iyengar and K. Chakrabarty, “Precedence-based, preemptive, and power-constrained test scheduling for system-onchip,” Proc. VTS, pp. 368-374, 2001. S. Kajihara, K. Ishida, and K. Miyase, “Test vector modification for power reduction during scan testing,” Proc. VTS, pp. 160-165, 2002. K.-J. Lee, T.-C. Huang, and J.-J. Chen, “Peak power reduction for multiple-scan circuits during test application,” Proc. ATS, pp. 453-458, 2000. J. Lee and N. A. Touba, “Low power test data compression based on LFSR reseeding,” Proc. ICCD, pp. 180-185, 2004. J. Lee and N. A. Touba, “LFSR-reseeding scheme achieving low-power dissipation during test,” IEEE Trans. CAD, vol. 26, pp. 396-401, Feb. 2007. X. Lin, D. Czysz, M. Kassab, G. Mrugalski, J. Rajski, and J. Tyszer, “Low power scan test application in test compression environment,” US patent application, 2007. N. Nicolici, B.M. Al-Hashimi, and A.C. Williams, “Minimization of power dissipation during test application in fullscan sequential circuits using primary input freezing,” IEE Proc. Computers Digital Techniques, vol. 147, pp. 313-322, Sept. 2000. J. Rajski, J. Tyszer, M. Kassab, and N. Mukherjee, “Embedded deterministic test,” IEEE Trans. CAD, vol. 23, pp. 776792, May 2004. J. Rajski, J. Tyszer, G. Mrugalski, N. Mukherjee, W.-T. Cheng, and M. Kassab, “X-Press compactor for 1000x reduction of test data”, Proc. ITC, paper 18.1, 2006. S. Remersaro, X. Lin, Z. Zhang, S.M. Reddy, I. Pomeranz, and J. Rajski, “Preferred fill: a scalable method to reduce capture power for scan based designs,” Proc. ITC, paper 32.2, 2006. P. M. Rosinger, B. M. Al-Hashimi, and N. Nicolici, “Low power mixed-mode BIST based on mask pattern generation using dual LFSR re-seeding,” Proc. ICCD, pp. 474-479, 2002. P. M. Rosinger, B. M. Al-Hashimi, and N. Nicolici, “Scan

Paper 13.2

[31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44]

[45] [46]

[47] [48] [49] [50]

architecture with mutually exclusive scan segment activation for shift and capture power reduction,” IEEE Trans. CAD, vol. 23, pp. 1142-1153, July 2004. R. Sankaralingam, R. R. Oruganti, and N.A. Touba, “Static compaction technique to control scan vector power dissipation,” Proc. VTS, pp. 35-40, 2000. R. Sankaralingam, B. Pouya, and Nur A. Touba, “Reducing power dissipation during test using scan chain disable,” Proc. VTS, pp. 319-324, 2001. R. Sankaralingam and N. A. Touba, “Controlling peak power during scan testing,” Proc. VTS, pp. 153-159, 2002. J. Saxena, K. B. Butler, and L. Whetsel, “An analysis of power reduction techniques in scan testing,” Proc. ITC, pp. 670-677, 2001. C. Shi and R. Kapur, “How power-aware test improves reliability and yield,” EE Times EDA news online, 09/15/2004. O. Sinanoglu, I. Bayraktaroglu, and A. Orailoglu, “Test power reduction through minimization of scan chain transitions,” Proc. VTS, pp. 166-171, 2002. N.A. Touba, “Survey of test vector compression techniques,” IEEE Design & Test of Computers, vol. 23, pp. 294-303, July-Aug. 2006. S. Wang, “Generation of low power dissipation and high fault coverage patterns for scan-based BIST,” Proc. ITC, pp. 834-843, 2002. S. Wang and S. K. Gupta, “ATPG for heat dissipation minimization during test application,” IEEE Trans. Computers, vol. 47, pp. 256-262, Feb. 1998. S. Wang and S.K. Gupta, “DS-LFSR: A BIST TPG for low switching activity,” IEEE Trans. CAD, vol. 21, pp. 842-851, July 2002. S. Wang and S.K. Gupta, “An automatic test pattern generator for minimizing switching activity during scan testing activity,” IEEE Trans. CAD, vol. 21, pp. 954-968, Aug. 2002. S. Wang and S.K. Gupta, “LT-RTPG: A new test-per-scan BIST TPG for low switching activity,” IEEE Trans. CAD, vol. 25, pp. 1565-1574, Aug. 2006. S. Wang and W. Wei “A technique to reduce peak current and average power dissipation in scan designs by limited capture,” Proc. ASP-DAC, pp. 810-816, 2007. X. Wen, K. Miyase, S. Kajihara, T. Suzuki, Y. Yamato, P. Girard, Y. Ohsumi, and L.-T. Wang, “A novel scheme to reduce power supply noise for high-quality at-speed scan testing,” Proc. ITC, paper 25.1, 2007. X. Wen, Y. Yamashita, S. Kajihara, L. Wang, K. K. Saluja, and K. Kinoshita, „On low-capture-power test generation for scan testing,” Proc. VTS, pp. 265-270, 2005. X. Wen, Y. Yamashita, S. Morishima, S. Kajihara, L. Wang, K.K. Saluja, and K. Kinoshita, “Low-capture-power test generation for scan-based at-speed testing,” Proc. ITC, pp. 1-10, 2005. L. Whetsel, “Adapting scan architectures for low power operation,” Proc. ITC, pp. 863-872, 2000. Q. Xu, D. Hu, and D. Xiang, “Pattern-directed circuit virtual partitioning for test power reduction,” Proc. ITC, paper 25.2, 2007. C. Zoellin, H.-J. Wunderlich, N. Maeding, and J. Leenstra, “BIST power reduction using scan-chain disable in the Cell processor,” Proc. ITC, 2006, paper 32.3. Y. Zorian, “A distributed BIST control scheme for complex VLSI devices,” Proc. VTS, pp. 4-9, 1993.

INTERNATIONAL TEST CONFERENCE

10