Combining Scan and Trace Buffers for Enhancing Real-time Observability in Post-Silicon Debugging Ho Fai Ko and Nicola Nicolici Department of Electrical and Computer Engineering McMaster University, Hamilton, ON L8S 4K1, Canada Email:
[email protected],
[email protected] Abstract—Scan is a known design-for-test technique in manufacturing test that has been successfully applied also to aid post-silicon debugging on testers. However, to achieve real-time observability in-field, embedded trace buffers are needed. In this paper, we discuss how in the presence of enhanced scan chains, trace buffers can be utilized efficiently for real-time debug data acquisition in-field.
I. I NTRODUCTION The purpose of pre-silicon verification methods, such as constrained-random simulation or formal techniques, is to identify and fix design errors before tape-out. Manufacturing test screens out the defective samples after fabrication. For the circuits that have passed manufacturing test, post-silicon debugging searches for design errors in silicon. While the former two tasks are well-researched and automated methods are available, the latter still relies on the engineering skills and the ad-hoc methods adopted within a particular organization. Because of the increasing complexity of post-silicon debugging, structured methods for post-silicon debugging have received more attention in recent years. Post-silicon debugging is performed in two main phases. In the first phase, the objective is to learn how to control the failure and its main challenges will be elaborated in the following paragraph. In the second phase, once the failure is controllable, the aim is to search for the root cause in space (identify the erroneous logic block) and time (find the exact clock cycle when the bug is excited). The second phase is commonly performed in controlled environments (such as automated testers) where patterns that are known to trigger the failure will be applied under different operating conditions. In this phase, the experiments are deterministic (also called repeatable) and the erroneous behaviour can be reproduced consistently. Because of the design-for-test (DFT) infrastructure that is readily available in the chip for manufacturing test, the tasks performed during this debugging phase rely heavily on scan chains for improved controllability/observability [20]. The main challenge in post-silicon debugging lies, however, in understanding how to control the failure. The subtle design errors that escape to silicon do not manifest for the first time in a controlled environment. Rather, they are triggered infield and at-speed when the native applications are running in real-time. Because of the latency that it takes from the bug excitation to its observation in the application, it is not known what and when has caused the circuit to misbehave. This
978-1-4244-5833-2/10/$26.00 ©2010 IEEE
problem is exacerbated by the fact that in a subsequent debug experiments, the circuit can operate without any in-field failures. This is because, even if most of the inputs/outputs (I/Os) are controllable through program/data memories, there are still many sources of non-determinism in-field: asynchronous interfaces, interrupts from peripherals, mixed signal circuitry, only to mention a few. Therefore, the debug experiments in this phase are referred to as non-deterministic (also called non-repeatable). Finding a way to record the traces that have caused the erroneous behaviour without halting the real-time execution is essential to developing the understanding on how to control the failure. This has led to the adoption of embedded logic analysis as an effective technique for in-field/at-speed debugging that helps improve real-time observability. A typical embedded logic analyzer contains three components: trigger unit, sample unit, and offload unit [1]. The trigger unit has programmable event detectors for detecting trigger conditions on the set of connected trigger signals to determine when data acquisition should be initiated. To provide real-time observability, the sample unit contains a trace buffer such as embedded memories for acquiring data from the trace signals. When the trace buffer is full, the offload unit can unload the sampled data for further processing. Due to the limited capacity of the trace buffer, only a subset of signals can be monitored when employing embedded logic analysis. On the other hand, enhanced scan chains, where each scan cell has two state elements, are used in manufacturing test to apply two consecutive stimuli, as needed for screening fabrication defects that affect the circuit timing [7]. They are known to be useful also for taking snapshots through scan, without halting the real-time execution during post-silicon debugging [14]. Nonetheless, the frequency at which state snapshots can be taken in real-time in-field is dependent on the I/O bandwidth allocated for scan dumps. In this paper, we describe a method such that the available embedded trace buffer space is used to store data from both trace signals and enhanced scan cells. We show how, when combining scan and trace, real-time observability can be improved during post-silicon debugging, when the data acquired in real-time is post-processed using the state restoration algorithm proposed originally in [12]. Since our method does not require data to be collected in multiple runs of the same debug experiment, its main application is in acquiring data in non-repeatable environments.
62
Shadow scan data Shadow scan enable
Flip-ßop Shadow ßip-ßop Enhanced scan cell
Functional/ scan data Shadow scan data
Enhanced scan chain n
Regular scan data
Enhanced scan chain 1
Functional data
Enhanced scan chain 0
Scan enable
Real-time data acquisition Debug unit
Circuitunderdebug
(a) Enhanced scan chains with a detailed illustration of each scan cell Fig. 1.
Debug controller
Logic
Trace signals
Trace buffer
Ofßine post-processing using state restoration 0 1 x x 1 1 x 1 ... 1 x x 1 0 x x 1 0 x x x 0 ... x x 1 1 0 0 x 0 0 1 1 0 1 ... 1 0 1 x x x 1 0 1 1 0 0 ... 0 1 1 0 0 1 x x 1 1 x 1 ... 1 x x 1 0 x x 1 0 x x x 0 ... x x 1 1 0
(b) Trace buffers for acquisition and state restoration for post-processing
Scan-based debug and trace buffer-based debug.
II. P RELIMINARIES AND MOTIVATION A number of solutions for reusing scan chains to improve observability of internal signals in a design during post-silicon debugging have been proposed (e.g. [4], [8], [10], [20]). However, since the circuit has to be stopped when offloading the scan data, using scan chains alone thus cannot provide realtime observability to the circuit-under-debug (CUD). Since bugs may be exercised thousands of clock cycles apart [11], it may be beneficial to maintain circuit execution during a scan dump. This can be done by utilizing enhanced scan, which uses an additional state element for each scan cell. Although this will lead to additional hardware investment, it has been commonly used for the purpose of delay fault testing [18]. As shown in Figure 1(a), enhanced scan cells have a shadow flip-flop, which is included into a shadow scan chain. When applying two consecutive scan patterns for delay-fault testing, the regular scan chain contains the initialization pattern and the shadow scan chain contains the excitation pattern. During realtime debugging, the state snapshot is offloaded into the shadow scan chain, without interrupting the execution. Nevertheless, successive captures of circuit states cannot occur until the shadow scan chain is emptied. The ability to capture data continuously on a subset of signals is the primary reason why embedded logic analysis using trace buffers is employed for providing real-time observability. However, as the amount of data one can capture is limited by the capacity of the on-chip trace buffers, a number of techniques have been proposed to address this issue (e.g., [1], [13], [15], [17], [19] for improving the design of the embedded logic analyzers, and [2], [3], [6] for compressing the trace data). Recently, [12] introduced the concept of state restoration, which utilizes the captured data in the trace buffer to reconstruct data in the remaining state elements across multiple timeframes. Together with their proposed trace signal selection algorithm (which has been improved subsequently in [16] and [21]), it was shown that real-time observability of the circuit can be improved without adding extra hardware. Debugging using trace buffers is illustrated in Figure 1(b). The on-chip debug controller will initiate the data acquisition, as a response to a programmable breakpoint or an assertion checker. The debug data is acquired on-chip and, after it is offloaded through a low-bandwidth interface, it is subsequently post-processed offline.
Recently, [9] and [22] discussed how to combine trace and scan. In [9], the data acquired in the trace buffer is first analyzed to identify the time window when a bug is detected. The debug experiment is then re-run until data in all the state elements in the targeted window is scanned out. In [22], an automated method to discover the root cause, by analyzing the trace data obtained in multiple runs of an experiment from a known state (obtained through a scan dump), is proposed. Both of these methods require the circuit responses to be consistent across multiple debug runs; hence they are useful only when the experiments are repeatable. As pointed out in the previous section, when the experiments are non-repeatable, design bugs may not be excited in every debug session. It is thus essential to acquire data in one debug experiment that provides sufficient observability, which can be quantified as the number of bits that can be inferred through state restoration (as in [12]) from the trace acquired in real-time. What is interesting about the state restoration algorithm from [12] is that it reconstructs data across multiple timeframes in such way that each timeframe is treated independently. Hence, even if the acquired data is not continuous (such as the data from scan dumps from nonconsecutive clock cycles), the algorithm can still work. Based on the above observation, and motivated by the fact that enhanced scan cells are already used for manufacturing test purposes (e.g., [14]), we propose an architecture that combines scan and trace for improving real-time observability. Both of these two techniques have been individually documented as being effective for acquiring debug data in real-time. However, to the best of our knowledge, their compounded effect on real-time observability has never been discussed in the public domain. Given the fact that the on-chip trace buffers continue to be adopted, and the I/O bandwidth for offloading the scan dumps in-field does not scale accordingly, we study for the first time the consequences of sharing the embedded memories between interleaved scan dumps (from the shadow scan chains) and the real-time tracing of essential signals in consecutive clock cycles. As it will be shown in our experimental results, when performing state restoration on this hybrid set of scan and trace data, more data from the circuit may be reconstructed when compared to only using the trace data from a larger group of trace signals.
63
TS0
Trace buffer (a) The proposed architecture Fig. 2.
TS1
TSw-1
Trace buffer
Scan controller
Shadow scan enable to the enhanced scan cells
ConÞguration register Discarded shadow scan cells SC = Scan chain TS = Trace signal
TS0
TS1
Shadow SC w-1
To multiplexers at the input of the trace buffer Shadow SC 1
TSw-1
SC = Scan chain TS = Trace signal
Shadow SC w-1
TS1
Discarded shadow scan cells
Shadow SC 1
TS0
To multiplexers at the input of the trace buffer Shadow SC 0
SC = Scan chain TS = Trace signal
Shadow scan enable to the enhanced scan cells
Shadow SC 0
ConÞguration register
Shadow SC w-1
Scan controller
To multiplexers at the input of the trace buffer Shadow SC 0
ConÞguration register
Shadow scan enable to the enhanced scan cells
Shadow SC 1
Scan controller
TSw-1
Trace buffer
(b) Discard shadow scan chains to sample trace (c) Discard shadow scan cells for more frequent signals scan dump
The proposed architecture where the trace buffer is shared by trace signals and shadow scan chains.
III. P ROPOSED ARCHITECTURE The known techniques for embedded logic analysis employ on-chip embedded memories as trace buffers to sample data in consecutive clock cycles on a selected group of trace signals. Scan data, on the other hand, is offloaded separately through dedicated scan pins, in which case (due to the limited availability of such scan pins) the time needed to perform each scan dump can be long. Furthermore, while for highvolume products (such as microprocessors), dedicated application boards can be used for bring-up purposes with sufficient I/O bandwidth for scan, that is not necessarily the common case for form-factored constrained embedded devices. To address this issue, we propose an architecture (shown in Figure 2) such that the storage space for the trace buffer is divided. While a part of the trace buffer is used to sample data from trace signals in consecutive clock cycles, the data captured in the shadow scan cells can be offloaded into the remaining part of the trace buffer at the same time. In this case, the number of shadow scan chains one can have will be limited by the width of the trace buffer. When a wider trace buffer is available, more and shorter shadow scan chains can be connected to the trace buffer. This shortens the time required to perform each scan dump, thus effectively increasing the amount of scan data one can gather over the course of a debug experiment. When this hybrid set of data is applied to state restoration, as will be shown in our experimental results, a large amount of data can be restored for the CUD. A. Division of storage space for trace and scan For a given trace buffer of limited capacity, if its storage space needs to be shared between trace data and scan data, an obvious tradeoff is to decide how many signals should be traced, and how much scan data should be stored during a debug experiment. To address this tradeoff, in the proposed architecture shown in Figure 2(a), we insert multiplexers at the input of the trace buffer, which are controlled by a programmable configuration register. This addition in hardware
allows the circuit to be reconfigured at runtime to collect different combinations of trace and scan data. The configuration register in the proposed architecture can be programmed to control the input multiplexers in different modes. For example, in Figure 2(b), the circuit is configured to monitor one trace signal (TS0), while the remaining input ports of the trace buffer are used to collect data from the shadow scan chains. Note however, to monitor each additional trace signal, the data from one more shadow scan chain will be discarded, as illustrated in Figure 3. For all the examples from Figure 3, the trace buffer capacity is assumed to be 16 Kbits, with a configuration of 16 x 1024. The circuit has 1600 shadow scan cells divided into 16 shadow scan chains (i.e., 100 scan cells in each scan chain). In Figure 3(a), the design is configured to sample data from 16 trace signals for 1024 clock cycles. In this mode, the debug architecture operates as a traditional embedded logic analyzer, where real-time data is gathered on the selected trace signals. Instead of performing continuous tracing on a small group of signals, the proposed debug architecture can also be configured to sample data as shown in Figure 3(b). In this mode, the architecture is configured to trace only one signal, while scan data from 15 of the 16 shadow scan chains is stored. With shadow scan chain length of 100, a new set of scan data can be captured every 100 clock cycles. In the considered window of 1024 clock cycles, a total of 10 scan dumps are performed. This comprises 1024 bits of trace data and 15000 bits of scan data. Note, the total amount of data collected under this configuration is less than the total amount of 16 Kbits data collected when the buffer is used to trace 16 signals without any scan dumps. However, since the scan data comes from a large number of shadow scan cells, the observability for the clock cycle when the scan data is captured is better than for any of the clock cycles when the 16 trace signals are stored. Similarly, one can also choose to sample more trace signals by discarding more shadow scan chains. This is shown in
64
Number of signals
Total acquired data = 1500 x 10 + 1 x 1024 = 16024 bits
1500
Number of signals
Trace data
Total acquired data = 16 x 1024 = 16384 bits
Scan data
16
0
100 200 300 400 500 600 700 800
1
Clock 900 1024 cycle
0
100 200 300 400 500 600 700 800
(a) Continuous tracing of 16 signals Number of signals
Clock 900 1024 cycle
(b) Tracing one signal with scan dumps
Total acquired data = 1400 x 10 + 2 x 1024 = 16048 bits
1400
Number of signals
Total acquired data = 700 x 20 + 2 x 1024 = 16048 bits
700
2 0
100 200 300 400 500 600 700 800
(c) Tracing two signals with scan dumps Fig. 3.
2
Clock 900 1024 cycle
0
100 200 300 400 500 600 700 800
Clock 900 1024 cycle
(d) Tracing two signals with more frequent scan dumps by discarding data in more shadow scan cells
Example for storage tradeoff between trace and scan data.
Figure 3(c), where two signals are traced, and only data from 1400 shadow scan cells are stored (since two shadow scan chains are discarded). By adjusting the combination of data to collect, the debug engineer can fine-tune the debug experiments. Since the capacity of the trace buffer is fixed, there is a constraint on the maximum number of scan dumps one can perform. Consider the following terminology: totalscan = total number of shadow scan cells w = width of the trace buffer d = depth of the trace buffer k = the number of trace signals being monitored Then, the length of the longest shadow scan chain l = totalscan /w. And the maximum number of scan dumps one can perform is tmax = d/l. B. Scan dump frequency There is another interesting tradeoff when employing the proposed architecture: how often a scan dump is performed vs. how many shadow scan cells are observed. In the architecture shown in Figure 2, we employ a scan controller to control how often data is captured into the shadow scan cells. It also determines how much of those captured data should be offloaded into the trace buffer. This is done by employing a programmable counter in the scan controller. The programmed value is then used to derive the value of the Shadow scan enable signal, which drives all the shadow scan cells. As a result, whenever the counter expires, the shadow scan cells
will capture data. Otherwise, the shadow scan chains will stay in the shift mode to offload the data into the trace buffer. The lower the value programmed into the counter, the sooner it expires and thus scan data is captured more often. However, when the expiration time is programmed to be shorter than the length of the shadow scan chains, some of the scan data will be lost. This is because the data in some shadow scan cells is overwritten before it can be stored into the trace buffer. The effect on programming such value into the scan controller is illustrated in Figure 2(c). Under this configuration, not only the scan data from the unselected shadow scan chains is discarded; the shadow scan cells at the end of the monitored shadow scan chains are also ignored. By allowing data in some shadow scan cells to be discarded, one will be able to perform more scan dumps than the maximum number of scan dumps as discussed in the previous subsection. Thus, for a given trace buffer depth, the shadow scan cells that are observed, will be observed more often. When the circuit is configured in this mode to perform more scan dumps without decreasing the amount of trace signals being monitored, the total number of shadow scan cells that will be ignored is totalscan − d/ttarget × (w − k), where ttarget is the targeted number of scan dumps. This tradeoff is illustrated in Figure 3(d), where two signals are traced, with the expiration time set to half of what is used in Figure 3(c). This allows twice the amount of scan dumps to be performed. However, this shorter expiration time allows data from half of the shadow scan cells to be observed.
65
In the example shown in Figure 2, the shadow scan cells that are discarded belong either to an entire scan chain or they are the scan cells that require the shadow scan chains to be kept in the shift-mode for more clock cycles in order to offload their content into the trace buffer. If adding extra multiplexers in between scan cells in the shadow scan chains for reconfiguration and bypassing is acceptable, then more runtime options can be provided (where in different debug sessions different scan cells can be observed).
Circuit s38584
s38417
C. Area investment To support the architecture shown in Figure 2, additional hardware is introduced. Using the terminology from Subsection III-A, the area investment can be estimated as follows. First, w two-input multiplexers are placed at the input of the trace buffer. Then, a configuration register of w bits long is included to control the two-input multiplexers. Finally, a scan controller includes of a programmable counter that does not need to be more than log2 l bits wide. The size of the programmable counter is related to the length of the longest shadow scan chain l because it takes no more than l clock cycles to offload the data from the shadow scan chains. If the targeted time between two scan dumps is larger than l, then there will be a period of time in which the scan data is already offloaded, but new data is not yet captured into the shadow scan cells (hence, the storage space in the trace buffer will be wasted). It is important to note that, when compared to the size of the trace buffer, which is already employed for the purpose of debug, the size of the additional hardware in the proposed architecture is thus negligible. So far, the proposed architecture we have discussed is based on the assumption that full enhanced scan (where every state element in the circuit is replaced with enhanced scan cells) is present. One may argue that due to the high area investment involved with the employment of full enhanced scan, it is not always available in a design. In practice, to lessen the area investment, partial enhanced scan can be used. The proposed architecture can still be applied to such designs. In such case, since there are less shadow scan cells, the shadow scan chains will be shorter. This allows more trace signals to be monitored, or more scan dumps to be performed. Either way, the ability provided by the proposed architecture to fine-tune the set of data collected in the trace buffer to aid state restoration in improving real-time observability is still maintained. IV. E XPERIMENTAL RESULTS Recall that by post-processing the data acquired in trace buffer using the state restoration algorithm from [12], the amount of data available to the debug engineer after performing a real-time debug experiment is increased. As a result, we evaluate the benefits of acquiring various combinations of trace and scan data using the proposed architecture in terms of real-time observability, which is quantified as the amount of data that is restored in a simulator from data that is acquired in the trace buffer. The reader is referred to [12] for more details
s35932
TABLE I DATA ACQUISITION USING A 32 K BITS TRACE BUFFER # of # of # of Acquired Restored trace scan discarded data data signals dump scan cells 32 0 NA 32768 179109 0 16 0 23232 176677 1 17 64 24603 133740 18 128 24838 132795 20 256 24924 131978 2 18 128 25844 786484 19 256 24734 656952 4 18 256 25552 715172 32 0 NA 32768 323858 0 16 0 26176 167822 1 17 64 27731 177854 18 128 28150 130507 19 256 27225 126459 2 17 128 27650 950662 19 256 28230 938950 4 17 256 27488 1433055 32 0 NA 32768 1063448 0 16 0 27648 134627 1 14 64 24306 373537 15 128 25009 371893 16 256 24560 324651 2 14 128 24420 375197 15 256 24098 356213 4 14 256 24648 439038
Restoration ratio 5.47 7.60 5.44 5.35 5.30 30.43 26.56 27.99 9.88 6.41 6.41 4.64 4.64 34.38 33.26 52.13 32.45 4.87 15.37 14.87 13.22 15.36 14.78 17.81
on the implementation of the algorithms for state restoration and automated trace signals selection. The experimental results for utilizing the proposed architecture on the three largest ISCAS89 benchmark circuits (i.e., s38584, s38417 and s35932) [5] are given in Table I. This table contains data for when a 32 Kbits trace buffer (organized as 32 x 1024) is used. In this case, 32 trace signals are selected and the shadow scan cells are divided into 32 shadow scan chains, each of them being 64 cells long. This means that the trace buffer can store data from any of the selected 32 trace signals, or 32 shadow scan chains for a maximum of 1024 clock cycles. In our experiments, the trace signal selection algorithm from [12] is used to choose trace signals with the highest restorability value. And the shadow scan cells are ordered in each shadow scan chain in such way that the flipflops with the lowest restorability value should be discarded first when reconfiguring the shadow scan chains. Note also, 10 sets of randomly generated data for each configuration of the architecture are used for the state restoration in these tables. In all the experiments, we only allow up to 256 shadow scan cells to be discarded. This is because the three largest ISCAS89 benchmark circuits have only about 1500 thousand state elements, and discarding more than 256 signals does not bring any benefits in terms of real-time observability. As it can be seen in Table I, for s38584 and s38417, the amount of data one can restore using only the trace data from 32 signals, only the data from scan dumps, or data from one trace signal with periodic scan dumps is very low. However, when combining the data from two or more trace signals with scan data from various numbers of scan dump, the amount of data available after state restoration increases by about 5 and 3 times for s38584 and s38417 respectively. It is observed that when tracing two or more signals, global control signals (such as synchronous resets/enables) are traced in each clock cycle.
66
This shows that by tracing the small set of control signals in consecutive clock cycles, and using the remaining storage space to store scan data for a much larger amount of signals in non-continuous clock cycles, the real-time observability of the design can actually be significantly improved for s38584 and s38417. On the other hand, for s35932, the most amount of data that is available after state restoration is when the trace buffer is used to sample 32 trace signals without any scan dump. This is because when 32 signals are traced, a large amount of data can be restored in the considered 1024 clock cycles due to the presence of the large fan-in gates in the circuit. However, when only a few signals are traced, while the remaining storage space of the trace buffer is used for scan data, many of the controlling signals of these large fanin gates are not traced. Although a scan dump can provide the data for more flip-flops in a particular clock cycle, without the continuous trace data from the signals that belong to the gates with a large fan-in, the acquired debug data cannot help restore a significant amount of data for the intervals in between the scan dumps. As a result, for this type of circuits, tracing more signals gives higher observability than performing more scan dumps, especially for large scan dump intervals. Therefore, it is beneficial to provide the feature to reconfigure the circuit in different modes, so that the decisions on what type of data to be collected can be made at run-time. Another observation is that when one performs more scan dumps with less number of scan cells, the amount of data available after state restoration actually decreases. This is because in our experiments, the scan cells that are discarded are chosen based on the restorability metric proposed in [12], which does not consider the presence of scan. Therefore, an interesting topic for future research is to automatically decide which shadow scan cells should be the best candidates to be discarded, by accounting also for the fact that scan dumps are not performed in consecutive clock cycles in real-time (as it is the case for the trace signals). Furthermore, it did not escape our attention that an equally interesting topic for further investigation is to develop new metrics and algorithms for automated trace signal selection when accounting for the presence of the enhanced scan chains and the duration of the scan dump interval. V. C ONCLUSION Unlike the existing approaches that consider either scan dumps (through enhanced scan chains) or tracing a subset of internal signals in real-time in post-silicon debugging, in this paper we have investigated the combination of the above. Our experiments have shown that a hybrid set of debug data may improve real-time observability of the circuit, when compared to doing only scan dumps or when only tracing a subset of signals. The proposed architecture provides the flexibility to decide at runtime what type of data is acquired, which ultimately enables more efficient usage of the limited storage space in the on-chip trace buffers.
R EFERENCES [1] M. Abramovici, P. Bradley, K. Dwarakanath, P. Levin, G. Memmi, and D. Miller, “A Reconfigurable Design-for-Debug Infrastructure for SoCs,” in Proceedings of the IEEE/ACM Design Automation Conference, 2006, pp. 7–12. [2] E. Anis and N. Nicolici, “Low Cost Debug Architecture using Lossy Compression for Silicon Debug,” in Proceedings of the IEEE/ACM Design, Automation and Test in Europe, 2007, pp. 225–230. [3] E. Anis Daoud and N. Nicolici, “Real-Time Lossless Compression for Silicon Debug,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 28, no. 9, pp. 1387–1400, Sep 2009. [4] ARM Limited, “The ARM7TDMI Debug Architecture,” ARM Limited, Tech. Rep. ARM DAI 0028A, Dec 1995. [5] F. Brglez, D. Bryan, and K. Kozminski, “Combinational Profiles of Sequential Benchmark Circuits,” in Proceedings of the IEEE International Symposium on Circuits and Systems, 1989, pp. 1929–1934. [6] M. Burtscher, I. Ganusov, S. Jackson, J. Ke, P. Ratanaworabhan, and N. Sam, “The VPC Trace-Compression Algorithms,” IEEE Transactions on Computers, vol. 54, no. 11, pp. 1329–1344, Nov 2005. [7] M. L. Bushnell and V. D. Agrawal, Essentials of Electronic Testing for Digital, Memory and Mixed-Signal VLSI Circuits. Boston: Kluwer Academic Publishers, 2000. [8] R. Datta, A. Sebastine, and J. Abraham, “Delay Fault Testing and Silicon Debug using Scan Chains,” in Proceedings of the IEEE European Test Symposium, 2004, pp. 46–51. [9] J. Gao, Y. Han, and X. Li, “A New Post-Silicon Debug Approach Based on Suspect Window,” in Proceedings of the 27th IEEE VLSI Test Sympopsium, 2009, pp. 85–90. [10] X. Gu, W. Wang, K. Li, H. Kim, and S. Chung, “Re-using DFT Logic for Functional and Silicon Debugging Test,” in Proceedings of the IEEE International Test Conference, Oct 2002, pp. 648–656. [11] D. Josephson, “The Manic Depression of Microprocessor Debug,” in Proceedings of the IEEE International Test Conference, Oct 2002, pp. 657–663. [12] H. F. Ko and N. Nicolici, “Algorithms for State Restoration and Trace Signals Selection for Data Acquisition in Silicon Debug,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 28, no. 2, pp. 285–297, Feb 2009. [13] H. F. Ko and N. Nicolici, “Resource-Efficient Programmable Trigger Units for Post-Silicon Validation,” in Proceedings of the IEEE European Test Symposium, 2009, pp. 17–22. [14] R. Kuppuswamy, P. DesRosier, D. Feltham, R. Sheikh, and P. Thadikaran, “Full Hold-Scan Systems in Microprocessors: Cost/Benefit Analysis,” Intel Technology Journal, vol. 8, no. 1, pp. 63–72, Feb 2004. [15] R. Leatherman and N. Stollon, “An Embedding Debugging Architecture for SOCs,” IEEE Potentials, vol. 24, no. 1, pp. 12–16, Feb-Mar 2005. [16] X. Liu and Q. Xu, “Trace Signal Selection for Visibility Enhancement in Post-Silicon Validation,” in Proceedings of the IEEE/ACM Design, Automation and Test in Europe, 2009, pp. 1338–1343. [17] A. Mayer, H. Siebert, and K. McDonald-Maier, “Boosting Debugging Support for Complex Systems on Chip,” IEEE Computers, vol. 40, no. 4, pp. 76–81, Apr 2007. [18] J. Rearick, “Too Much Delay Fault Coverage Is a Bad Thing,” in Proceedings of the IEEE International Test Conference, 2001, pp. 624– 633. [19] M. Riley and M. Genden, “Cell Broadband Engine Debugging for Unknown Events,” IEEE Design and Test of Computers, vol. 24, no. 5, pp. 486–493, 2007. [20] G. Van Rootselaar and B. Vermeulen, “Silicon Debug: Scan Chains Alone are Not Enough,” in Proceedings of the IEEE International Test Conference, 1999, pp. 892–902. [21] J.-S. Yang and N. A. Touba, “Automated Selection of Signals to Observe for Efficient Silicon Debug,” in Proceedings of the 27th IEEE VLSI Test Sympopsium, 2009, pp. 79–84. [22] Y.-S. Yang, N. Nicolici, and A. Veneris, “Automated Data Analysis Solutions to Silicon Debug,” in Proceedings of the IEEE/ACM Design, Automation and Test in Europe, 2009, pp. 982–987.
67