Thermal-Safe Dynamic Test Scheduling Method

0 downloads 0 Views 1012KB Size Report
multiple cores are subjected to customized thermal profiles. ... Thermal-Safe Dynamic Test Scheduling Method Using On-Chip Temperature Sensors for 3D ...
Copyright © 2012 American Scientific Publishers All rights reserved Printed in the United States of America

Journal of Low Power Electronics Vol. 8, 1–11, 2012

Thermal-Safe Dynamic Test Scheduling Method Using On-Chip Temperature Sensors for 3D MPSoCs Rama Kumar Pasumarthi1 , V. R. Devanathan2 , V. Visvanathan2 , Seetal Potluri1 , and V. Kamakoti1 ∗ 1

Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India 2 Texas Instruments, Bengaluru, India (Received: 2 July 2012; Accepted: 18 October 2012)

System test and online test techniques are aggressively being used in today’s SoCs for improved test quality and reliability (e.g., aging/soft-error robustness). With gaining popularity of vertical integration such as 2.5D and 3D, in the semiconductor industry, ensuring thermal safety of SoCs during these test modes poses a challenge. In this paper, we propose a dynamic test scheduling mechanism for system tests and/or online test that uses dynamic feedback from on-chip thermal sensors to control temperature during shift (or scan) and capture, thereby ensuring thermal-safe conditions while applying the test patterns. The proposed technique is a closed loop test application scheme that eliminates the need for separate thermal simulation of test patterns at design stage. The technique also enables granular field-level configuration of thermal limits, so that different units across multiple cores are subjected to customized thermal profiles. Results from implementation of the proposed schemes on a 4-layer, 16-core, 12.8 million gates, OpenSparc S1 processor subsystem are presented.

Keywords: 3D Test, Online Test, Thermal-Safe Testing, Multicore Computer Systems, Dynamic Test Scheduling, Thermal Sensors, System-on-Chips.

1. INTRODUCTION Increasing power consumption in lower nanometer technologies has led to the problem of overheating as packaging materials are unable to dissipate heat effectively.1 This problem is exacerbated in case of three-dimensional (3D) ICs, where farther layers from the heat sink are not cooled effectively2 resulting in overheating. Figure 1 shows the increase in temperature while testing (Scanbased system test) a 4-layer 16-core (4 cores per layer) OpenSparc 3D-MPSoC that has no thermal-safety mechanism in place. It is important to note that the cores on the topmost layer can get heated up to 200  C within a few time intervals after starting the scan-test, if the temperature is not controlled. The higher power density in 3D ICs as compared to 2D ICs can cause formation of thermal hotspots. These hotspots can affect the performance and reliability of the device and possibly cause destruction of the device. Methods by Coskun et al. focused on temperature management using liquid cooling for 3D architectures, but ∗

Author to whom correspondence should be addressed. Email: [email protected]

J. Low Power Electron. 2012, Vol. 8, No. 5

this can prove to be expensive.3 The problem of overheating and thermal hotspots is even more important in case of testing. The reason for that is a higher power consumption during test compared to those during functional mode.4 Traditional approaches of limiting test power dissipation have been shown to be insufficient to prevent thermal violations.5 6 This has lead to several techniques reported in the literature that deal with test scheduling under thermal, power and resource constraints. Traditional thermal aware test scheduling requires extensive thermal simulation of the given chip at the design stage to compute various static test schedules. There may be multiple issues with such thermal simulation approaches which are summarized as follows: • The thermal behavior of a chip is sensitive to environmental conditions. This implies that the actual thermal profile of the chip during execution on the field may be different from the estimated thermal profile at the design stage. For example, the assumed initial temperature of the chip-under-test to estimate its thermal profile may not be correct. Also, the external environment of the chip when deployed in the field has to be modeled accurately at its design stage to ensure accurate estimation of the thermal profile. Over-estimating the temperature might lead

1546-1998/2012/8/001/011

doi:10.1166/jolpe.2012.1226

1

Thermal-Safe Dynamic Test Scheduling Method Using On-Chip Temperature Sensors for 3D MPSoCs

Fig. 1. Thermal profile of a 4-layer 16-core OpenSparc S1 processor subsystem with scan-based system test, assuming no thermal control. X-axis: Each time interval is 13.5 s; Y -axis: Temperature in  C.

to test-schedules that cause unnecessary increase in test access time, whereas under-estimating it could lead to violation in thermal constraints, leading to damage of the chip. In case of system/online test, static schedules are not applicable at all, as initial thermal profile of the chip in the field prior to application of the test is unknown. Thus, accuracy of thermal simulation models, power models and accurate prediction of initial thermal state are very critical and difficult to estimate in the case of system test and/or online test scenarios; and, • Complete thermal simulation is time-consuming and orders of magnitude slower than full-fledged timing simulation. With test pattern generation and verification being in the critical path for tape-out of designs, ensuring thermal safety of patterns by thermal simulations for large designs is impractical. To sum up, static scheduling and sensor based application of static schedules might work when patterns are applied from tester in a controlled environment, but cannot be used in the case of online-test as initial temperature values themselves are unknown. Thus, there is a need for dynamic thermal management during test and functional modes. The rest of this paper is organized as follows: Section 2 presents the literature survey and contributions of this paper. Section 3 describes the proposed thermal-safe dynamic test scheduling methodology. Section 4 details the experimental setup. Section 5 presents the experimental results and its interpretations. Section 6 concludes the paper presenting interesting open issues.

2. BACKGROUND AND PRIOR WORK Many low-power test generation techniques have been extensively researched in literature.7 8 However, the limiting power dissipation alone cannot guarantee against 2

Pasumarthi et al.

thermal hotspots due to lateral thermal interactions. In addition, in the case of 3D ICs, power constraints alone cannot take into account the complex thermal interaction between layers.9 Rosinger et al. explored a method for rapid generation of thermal safe tests schedules by adopting a clique cover algorithm over a resource conflict graph (RCG) to obtain thermal safe test schedules.10 They also used a thermoresistance model to compute thermal profiles quickly but with less accuracy. Cho et al. proposed an algorithm for scan vector ordering, PEAKASO,11 to minimize the peak temperature during scan testing. This method used a window-based power analysis to predict hotspots in a given scan-chain stitched circuit taking the corresponding scan vectors as input. The techniques mentioned above used coarse grained thermal models that cannot accurately model temporal and spatial temperature dependencies,5 which are neccesary in case of 3D ICs due to inter-layer thermal interaction. Skadron et al. proposed HotSpot,12 an accurate temperature model for planar ICs and later extended it to account for 3D circuits as well. This included features such as Through-Silicon-Via (TSV) modeling. Liu et al. proposed a thermal-aware test scheduling scheme based on a rectangle packing heuristic that ensured that the heat generated during test is evenly spread over the chip, thereby minimizing the occurrence of hotspots on the chip.13 They assumed that a single long test can cause a thermal violation.14 Thus, they proposed partitioning of test sets into test subsequences with cooling periods introduced between the application of these subsequences onto the chip. In the context of testing multicores using this technique, the test application to different cores were interleaved such that the cooling period reserved for one core could be utilized for test application to other cores. Bild et al. developed an optimal mixed integer linear programming formulation to generate thermal-safe test schedules.15 A static test scheduling scheme using shift frequency scaling was proposed in Ref. [16]. Yao et al. exploited the superposition property of thermal models, such as HotSpot proposed in Ref. [12], to rapidly compute the thermal profiles of the given SoC design.17 These profiles were further used to generate a partition-based thermal-aware test schedule for the given SoC. The limitation of the above methods is that they require generation of static testschedules that need to be verified using thermal simulation tools. As mentioned earlier, computation of such thermal profiles at the design stage of a chip can be inaccurate, time consuming and require generation of large number of schedules from which the optimal one is chosen. This motivates the need for dynamic test scheduling using onchip thermal sensors to ensure the thermal safety during actual test application on the chip, especially in case of system/online testing of the same. On-chip sensors have been traditionally used in taskscheduling during functional operation.18 However, there is a fundamental difference between task and test scheduling. J. Low Power Electron. 8, 1–11, 2012

Pasumarthi et al.

Thermal-Safe Dynamic Test Scheduling Method Using On-Chip Temperature Sensors for 3D MPSoCs

In the case of task scheduling, a task can be assigned to any core and be shifted from the queue of one core to another. However, in case of test scheduling, a given test must run on the designated core. Hence, thermal aware task scheduling techniques cannot be directly used for test scheduling. Yao et al. proposed a dynamic test scheduling scheme19 that used the static test schedules developed in Ref. [17]. The test partitions were applied one after another as per the computed static schedule. Before applying a test partition to a chip, the current temperature profile of the chip was read in from on-chip temperature sensors and used to predict possible thermal violations for the next partition. If a thermal violation was predicted, the application of the test partition was delayed and the chip was allowed to cool. The limitations of this technique are its dependency on static schedules, and, the inaccuracy introduced due to external environment conditions, in the prediction of possible thermal violations at each stage. A major drawback of the technique is that, once a test partition is scheduled and applied, no corrective action can be taken during its application. Hence, an inaccurate prediction can lead to thermal violation. The accuracy of the prediction can be high in a controlled environment such as in the one offered by a tester. In addition, at the manufacturing test stage, each die of a 3D chip would be separately tested in a tester, where the heat dissipation ability is better as the environment is controlled. While employing the technique in Ref. [19], the bigger problem of thermal safety arises during the system test or online test scenario after the 3D integration, wherein the initial temperature/environmental conditions are not known apriori. 2.1. Contribution of This Paper In this paper, we propose a dynamic test scheduling mechanism for system tests and/or online tests that use dynamic feedback from on-chip thermal sensors to control temperature during shift (or scan) and capture cycles, thereby ensuring thermal-safe conditions while applying the test patterns. The proposed technique is a closed-loop test application scheme that eliminates the need for separate thermal simulation of test patterns at design stage. The technique also enables a granular field-level configuration of the thermal limits so that different units across multiple cores can be subjected to different thermal profiles. For example, the cores that are farthest from the heat-sink (topmost layer of the 3D chip) may be allowed a lower ambient temperature, while those closest to the heat sink (bottom layer) may be allowed a higher ambient temperature. Results from implementation of the proposed schemes on OpenSparc S1 processor subsystem are presented. The results obtained motivate use of modular test compression techniques (such as Modular TestKompress) that have potential for not only reducing the test-time but also to J. Low Power Electron. 8, 1–11, 2012

establish fine-grain thermal control over different modules inside a core. Modular TestKompress with stored patterns is used in this paper for system test due to high test coverage requirements, similar to Ref. [20]. It may be noted that the proposed technique is equally applicable for Logic BIST tests as well. To the best of our knowledge, the dynamic test scheduling technique proposed in this paper is the first in the literature which ensures thermal-safety without the need for thermal simulation. This is also the first thermal-safe dynamic test scheduling technique reported in the literature for system/online tests. The proposed methodology is presented in the next section.

3. PROPOSED METHODOLGY This section presents the proposed thermal-aware dynamic test-scheduling for 3D MPSoCs. As this is an on-chip solution, it needs to be simple. The following are the assumptions: • There exists on-chip thermal sensors which can be read to know the temperature of the different parts of the given chip. This is a realistic assumption as many current chips (For e.g., the IBM POWER7 microprocessor) have thermal sensors to observe the temperature in functional mode.21 • For every set of scan-chains that need to be controlled, three temperature thresholds, namely, Lower Limit (LL), Mid Upper Limit (MUL) and Upper Limit (UL) may be assigned. In case of a processor core to be tested, these values can be set in the BIOS and read into the system at the test time. • For every set of scan-chains that need to be controlled there exists a corrective-action hardware that implements the state machine as shown in Figure 2. Figure 3 illustrates one such controller. • The shift frequency is scalable. In other words, the shift frequency can be set to F (normal frequency), F /2 or 0(no shifting). This can be achieved using a simple counter logic. It is assumed that the required clock-divider-PLL16 and clock-gating logic is available in the design to divide or gate-off the shift clock respectively. 3.1. The Corrective Action This section explains the state machine shown in Figure 2 that implements the corrective action to control the temperature. The state machine is in one of the three states, namely, N (Normal), H (Half Frequency), and C (Cooling). A chip can have multiple of such state machines implemented as on-chip hardware. Each state machine controls one or more scan chains that shift test patterns to some part of the chip. Every state machine fixes the frequency at which test patterns are shifted in to that part of the chip that it controls, based on the temperature of that part. To start with, if the temperature is below the Mid 3

Thermal-Safe Dynamic Test Scheduling Method Using On-Chip Temperature Sensors for 3D MPSoCs

Fig. 2.

The corrective action state-machine.

Upper Limit (MUL) (the state machine is in the state N in Figure 2 the state machine shifts the test patterns at normal frequency N . As the temperature crosses the Mid Upper Limit (MUL) (the state machine is in state H  the state machine shifts the test patterns at half the frequency. It continues to do so as long as the temperature is above the Lower Limit (LL) and below the Upper Limit (UL). At this stage, if the temperature reduces below LL, then the state machine starts shifting the patterns at normal frequency (returns to state N . If the temperature crosses UL, then the state machine stops shifting the patterns (moves to state C thereby allowing the part of the chip to cool. Once the temperature reduces below the LL mark the state machine again starts shifting the patterns at normal frequency F (returns to state N . 3.2. DFT Considerations for Fine-Grained Thermal Control Use of multiple thermal sensors to monitor the temperature of various units within a processor core is becoming increasingly common. For example, the IBM Power7 chip with eight processor cores contains 44 on-chip thermal sensors, with five sensors per processor core.21 In this section, we present DFT considerations to leverage the presence of multiple thermal sensors to perform finegrained thermal control during online and system test. (1) Scan Compression Architecture: The design is partitioned into multiple logical units for test with each unit

Fig. 3.

4

Thermal-safe test controller.

Pasumarthi et al.

having at least one thermal sensor, and each partition having a separate clock. Scan insertion is done bottom-up. Scan insertion at each unit comprehends the compression entitlement of the design to control the length of the longest scan chain. Modular test compression with multiple scan compression codecs is implemented in the design, with each paritition having a mini-codec within it. These may either be native ATPG vendor tool solution such as Modular TestKompress22 or multiple codecs implemented in the design such as the one described in Ref. [23]. This technique can be used for both inter-partition and intrapartition tests, with dedicated design-level scan pins used to drive/observe codecs of each partition. One point worth observing in such a scheme with independent access to the codecs of various partitions is that the inter-partition tests require capture cycles of all the partitions to be synchronized so that the entire response can be captured from the last shift state of all partitions. On the other hand, this requirement is not necessary for intra-partition test as long as the partitions are well-wrapped/isolated from each other24 and intra-partition test patterns are generated with all inputs constrained to Xs (unknowns). The online/system test controller in this paper is similar to the online field test controller described in Ref. [25], with the following variations: (a) pattern stimuli from Modular TestKompress are stored in on-chip ROM; (b) Control signals from the thermal-safe test controller to the clocks are based on the temperature value sensed by the thermal sensor and the dynamic test scheduling policy. It may be noted that while this presents results with the system test controller supporting Modular TK with patterns stored in on-chip ROM, the proposed technique also supports LBIST-driven system tests or low power LBIST using weighted LFSR. Additional considerations to support these BIST schemes are presented below. (2) Fine-grained thermal control: Figure 3 illustrates the block diagram of the thermal-safe test controller. The slice extractor triggers the temperature sensor to capture the temperature code. The slice extractor is configurable to control granularity of temperature control, such as multiple measurements within a pattern or one measurement every few patterns. The key factors behind the slice granularity choice are the length of the longest chain and the thermal time constant of the wafer/packaging material. The temperature sensor (which is the same as the functional mode temperature sensor) encodes the temperature to a digital value. The comparator compares the encoded temperature with the pre-defined thresholds and identifies the action based on the thermal policy as illustrated in Figure 2. The clock control unit implements the action, such as clock divide-by-2, clock gate-off, etc. Now, with each partition having a separate thermal-safe test controller, each codec of the partition independently decides to control the shift clock. This implies that for a particular pattern, each partition shifts the test data independently. J. Low Power Electron. 8, 1–11, 2012

Pasumarthi et al.

Thermal-Safe Dynamic Test Scheduling Method Using On-Chip Temperature Sensors for 3D MPSoCs

Fig. 4. Interaction between system test controller and thermal-safe test controller.

In such a situation, it is important for the system test controller of the design to ensure consistency. Figure 4 illustrates the interaction between the thermal-safe test controller of each partition and the system test controller. The output clock from the thermal safe test controller controls the clock of all the scan flops in its partition. System test controller keeps track of the progress made by each partition using the FSM_CTRL feedback, to incorporate the thermal-aware clocking changes performed by the thermal-safe test controller in its internal state machine. While extending this to conventional LBIST involves very similar clocking changes to ensure that LBIST controller remains in-sync with thermal safe-test controller, extending the proposed scheme for low-power LBIST (such as weighted LFSR7 with varying toggle rates) is also interesting. In lieu of clocking changes, the BIST controller would now move to one of low switching activity mode. This aspect of controlling switching activity using weighted LFSR/low-power LBIST is not analyzed further in this paper and is a scope for future work. For a scan pattern, a gated sub-chains technique was proposed in Ref. [26] where multiple partitions were shifted separately to reduce power. We use a similar scheme where shift of each partition is determined by the thermal-safe test controller within that partition, to control the temperature at a more granular level. In this technique, the modified system test controller keeps track of the progress (in terms of shift cycles completed) of various partitions (using FSM_CTRL signals of each partition). Only for the case of inter-partition test, the capture cycles of all partitions are synchronized to ensure that all partitions have completed shift. In this case, all units capture simultaneously and move to the next pattern. This synchronization is crucial only for inter-partition tests. With the absence of such synchronization between various partitions, the system test controller can exploit the independence and parallelism between various units to schedule tests dynamically based on the thermal constraints of various partitions at a finer granularity, while ensuring maximum parallelism thereby possibly resulting in higher throughput (i.e., test time reduction). J. Low Power Electron. 8, 1–11, 2012

In an online test environment, timeslots are provided during the functional operation to perform system test. The functional context is saved (restored) before (after) the test and the test context is also saved after the test to ensure that test continues from the subsequent set of patterns at later time-slots. It may be noted that while the synchronization simplifies such online test environment, it necessitates the storage of the last test pattern state at the end of each time-slot. In the absence of such synchronization between various partitions (for test time reduction with intra-partition tests), additional book-keeping storage is required to track the completed test for each partition to store the test context at the end of each test time-slot during functional operation. (3) Fine grained thermal control with power gating: The impact of power-gating towards improving throughput of multi-core processors was analyzed in Ref. [27], wherein the authors showed that by turning-off cores dynamically when the parallelism in the tasks reduces, the throughput of the system increases by 14%. We use this motivation and extend the fine-grained thermal control by using power-gating to turn-off cores during their cooling period when the clocks are turned off by the thermal-safe test controller. The major challenge is to prove the efficiency of the proposed methodology. The same is achieved by implementing the proposed technique using a simulation framework on a 4-layer, 16-core, 128 million gate OpenSparc S1 processor subsystem design. The implementation, results and its interpretation are presented in the remaining sections of this paper.

4. IMPLEMENTATION In this section, we describe the experimental setup in detail. The chip under test is a symmetric Chip MultiProcessor (CMP), which consists of 16 openSPARC S128 cores, which is a reduced version of the openSPARC T1.29 While openSparc T1 is a multiprocessor with 8 cores, the openSPARC S1 has only one 64-bit SPARC v9 core and includes a Wishbone Master interface to connect with other cores.30 The S1 core was synthesized using Synopsys Design Compiler (DC) with an industry-standard 65 nm standard cell library. The memory interface logic was tested using memory scan wrappers.31 To enable scan testing, the core was scan-stitched with 100 scan chains using the MGC’s DFT Advisor. Subsequently the scan-inserted core was passed through the MGC’s TestKompress to add Embedded Deterministic Test EDT logic into the design to aid test compression.32 The EDT reduced the number of scan inputs from 100 to 8 and generated test patterns for single stuck-at faults (SSF) at maximum shift and capture frequency of 500 MHz for system test. The final EDT-inserted core consisted of 857 K gates and 668 K flip–flops 5

Thermal-Safe Dynamic Test Scheduling Method Using On-Chip Temperature Sensors for 3D MPSoCs

Fig. 5.

3D floorplan of 16 cores.

(spread across 100 scan chains) per core. Each core had 10 modules, 8 of which were logic modules and 2 of them were memory (Icache and Dcache). The setup used 16 such cores, 4 each per layer on a 4 layer 3D chip as shown in Figure 5. The floorplan of the 16 cores is based on the reciprocal design symmetry (RDS) scheme proposed in Refs. [33, 34], wherein, the core in an upper layer is placed above the cache in the bottom layer to reduce thermal interaction between cores. The internal floorplan of a single OpenSparc S1 core is shown in Figure 6. These floorplans were used by the HotSpot tool for thermal analysis.12 The thermal-safe test controller, implemented to incorporate the thermal policy, is a very small unit with just around 200 gates (for slice extractor, comparator and clock control) when synthesized with DC at 500 MHz. HotSpot

Fig. 6.

6

Internal floorplan of a single OpenSparc S1 core.

Pasumarthi et al.

tool12 was used for thermal analysis to verify and prove the effectiveness of the proposed scheme. It is worth noting that detailed thermal simulations are presented in this paper to prove the effectiveness of the proposed scheme and such detailed thermal simulations of test patterns are not required for production designs. It is enough to verify the closed-loop thermal control system for correct generation of ACTION (or FSM_CTRL) to the system test controller for various temperature codes provided by the thermal sensor. One key problem faced is to measure dynamic and leakage power at various temperatures, while shifting test vectors and capturing responses for thermal simulation. Memory libraries were generated using CACTI tool,35 for leakage power data for a given temperature. For the logic modules, leakage data for various granular temperatures were generated as follows. The cells in the standard cell library were available with characterization done at three temperatures, namely, − 40, 25 and 125 degree centigrade. The design was synthesized at three different temperatures and the average leakage power per standard cell module at each temperature was found. For every cell a leakagepower function that determines the leakage power given a temperature value as input was computed using a curvefitting procedure. Curves were fit according to the following sub-threshold current equation as stated in Ref. [36]: Id = 0 C0x

W cVT 2 eVgs −Vt +Vds /nVt 1 − eVds /Vt  L PL = AT 2 e−B/T

The first step was to determine the granularity of temperature sampling during thermal simulation. This is also referred to as the frequency of slice extraction of the thermal-safe test controller described earlier. The sampling interval was fixed at 135 s time which translated to time required for applying 10 test vectors if shifted at normal frequency and 5 test vectors if shifted at half-frequency. This sampling interval gave good precision to observe the effect of frequency control on the temperature, without drastically increasing the simulation time. The thermal simulation flow to evaluate and verify the proposed setup is illustrated in Figure 7 and is described in detail below. As mentioned earlier, every core has a thermal-safe test controller (TTC) that implements the state-machine as shown in Figure 2. The TTC of every core was initialized to state N (refer Fig. 2) to start with. The LL, MUL and UL for every TTC are set. As mentioned earlier, these values are configurable and do not need to be the same for all cores. The HotSpot tool was initialized to an ambient temperature of 45 degree centigrade (the default setting in HotSpot). For every core, the first set of 10 patterns were shifted and the simulation was done using the Synopsys VCS verilog simulator to generate the VCD (Value Change Dump) values. These VCD values were input to the Cadence RTL Compiler to compute the average J. Low Power Electron. 8, 1–11, 2012

Pasumarthi et al.

Fig. 7.

Thermal-Safe Dynamic Test Scheduling Method Using On-Chip Temperature Sensors for 3D MPSoCs

Thermal-safe test simulation flow.

dynamic power for that time interval for each of the 8 logic modules per core. The frequency-scaling feature supported by the RTL Compiler was used to compute dynamic power during shift operation at varying shift-frequency values and during capture operation at the normal frequency value (500 MHz in our case). These power values computed for all the cores were input to the HotSpot tool that in turn output the module-wise temperature for each core. The respective temperature values were fed into the CACTI tool and the leakage-power functions for logic modules to compute the leakage power. This leakage power values were input back to the HotSpot tool to compute any change in temperature. If there is a change in temperature, the new leakage power values for the changed temperature is calculated and the process is repeated. If this iteration does not converge, a thermal runaway is detected. If the iteration converges, then the maximum of temperature values of all modules in a core is computed. This is the Tmax temperature for the core. This is input to the TTC of the core for thermal simulation, which decides on the shiftfrequency of the next set of test-patterns as described in the state-machine in Figure 2. If the TTC moves to state H (refer Fig. 2) the simulation is done for the next 5 patterns and if the TTC moves to state C then no simulation is done (the dynamic power is 0 for the next sampling interval. The above procedure mimics the functioning of the 16 core chip with the TTCs controlling the shift-frequency per-core. Note that irrespective of the shift frequency the capture frequency was 500 MHz. The default HotSpot settings were used for setting the other parameters involved in thermal simulation. J. Low Power Electron. 8, 1–11, 2012

5. RESULTS This section presents the results obtained by performing the experiment as described in the previous section. Figure 8 shows the maximum temperature during test of four different cores, each in a different layer of the 4-layer, 3D chip. This experiment was conducted by setting the following thermal limits homogeneously for all the cores— LL = 100  C, MUL = 120  C and UL = 140  C. As the cores are symmetric, the cores in the same layer have almost the same thermal profile. As seen in Figure 8 the temperature is in the range as prescribed by the thermal

Fig. 8. Thermal profiles of various layers, using LL = 100  C, MUL = 120  C and UL = 140  C. X-axis: Each time interval is 13.5 s; Y -axis: Temperature in  C.

7

Thermal-Safe Dynamic Test Scheduling Method Using On-Chip Temperature Sensors for 3D MPSoCs Table I. Number of timing intervals to complete test with thermal limits LL = 100  C, MUL = 120  C and UL = 140  C. Layer

C1

C2

C3

C4

L4 L3 L2 L1

271 293 210 48

274 292 210 47

311 215 222 44

302 244 222 47

limits unlike what is illustrated in Figure 1 for a thermally uncontrolled environment. Table I shows the number of timing intervals (each timing interval is 13.5 s) required to complete testing of the 16 cores with a given set of test vectors. The testing for the cores in Layer 1 is finished within the first 48 timing intervals and the cores have started cooling beyond that point (refer Fig. 8), while that for the Layer 2 finished at 222 time intervals. This was followed by the cores in Layers 3 and 4 that took close to 300 time intervals. The reason for this behaviour is that the heat sink is closer to Layer 1 and hence the cores in that layer gets cooled quickly. It is also interesting to note from Figure 8 that the cores in Layer 2 and above heats up quickly but cools down slowly. The time to cool down increases with increasing layer numbers. Figure 9 illustrates the corrective action on shift-frequency as applied to cores belonging to different layers. It is interesting to note that the shift-frequency at Layer 1 level remained at 250 MHz (F /2 for major portion of the test time, while the others were oscillating between 0 (cooling period) and 500 MHz. Table II illustrates a trade-off between test application time and thermal profiles. It is interesting to note that with lower thermal profiles the test application time increases. As seen in Table II, in the scheme S2 (with a LL thermal limit of 80  C), for the cores in higher layers (Layer 2 and above) it takes double the time to apply 100 test patterns

Pasumarthi et al.

Table II. Number of timing intervals to complete testing of all the 16 cores with 100 test patterns under two schemes: S1 with thermal limits LL = 100  C, MUL = 120  C AND UL = 140  C and S2 with thermal limits LL = 80  C, MUL = 100  C and UL = 120  C. Scheme

Number of timing intervals

S1

L4: L3: L2: L1:

39, 40, 41, 18,

39, 40, 41, 18,

43, 32, 41, 19,

39 39 41 18

S2

L4: L3: L2: L1:

92, 91, 85, 25,

92, 91, 85, 25,

97, 83, 88, 25,

97 83 88 25

when compared with the time required to perform the same test on the cores using scheme S1 (with a LL of 100  C). This is due to the fact that more cooling time is needed for the cores when subjected to the scheme S2 than when subjected to the scheme S1. An experimental study of the thermal runaway was carried out by simulating the 4-layer 16-core OpenSparc system till it heated up to a significant temperature and then shutting down all the cores so that the leakage power alone is dissipated by them. At this point a thermal runaway can happen if the increase in leakage power is large enough to increase the temperature and vice versa. So, after shutting down if the temperature keeps increasing significantly then there is a thermal runaway. Table III summarizes the results obtained. As expected the layer 4 which is farthest from the heat sink has a lower temperature which can cause the thermal runaway when compared with other layers closer to the heat sink. Interestingly no thermal runaway was possible in Layer 1 which is closest to the heat sink. Figure 10 shows for a core in Layer 4, the thermal profiles of the 8 logical modules as observed in the experiment shown in Figure 8. It is interesting to note that only couple of the modules, like ffu (Floating-point front-end unit) and lsu (load store unit) were mainly responsible for heating up that core, while the other modules had a thermal profile well within the mid upper limit (MUL) of 120  C. Thus, when one of the modules cross the Upper Limit, the other modules whose temperature is within the limits can still be tested rather than suspending the test of the entire core. This motivates per-module fine-grained thermal control. These units like the ffu and the lsu contain more than 10K scan-flops each. Thus, they consume large dynamic power while testing. Proceeding further, Figure 11 shows Table III. Thermal runaway of cores. Layer 4 farthest from heat sink; Layer 1 closest to heat sink.

Fig. 9. Corrective action on shift-frequency for cores in various layers, using LL = 100  C, MUL = 120  C and UL = 140  C. X-axis: Each time interval is 13.5 s; Y -axis: Frequency in MHz.

8

Cores Cores Cores Cores

in in in in

Layer Layer Layer Layer

4 3 2 1

145  C 155  C 195  C Runaway not possible

J. Low Power Electron. 8, 1–11, 2012

Pasumarthi et al.

Thermal-Safe Dynamic Test Scheduling Method Using On-Chip Temperature Sensors for 3D MPSoCs

Fig. 10. Thermal profiles of modules in a layer 4 core when simulated with LL = 100  C, MUL = 120  C and UL = 140  C. X-axis: Each time interval is 13.5 s; Y -axis: Temperature in  C.

the number of timing intervals required to apply a given set of test patterns (same set as the one used to obtain the results shown in Table I which used a core-level thermal control) to the different functional modules of the cores spread across different layers. For all the modules except the ffu the test patterns are applied within the first 175 timing intervals, which is only 56% of the time consumed by the core-level thermal control technique (refer Table I). However, the ffu module in Layer 4 took 350 timing intervals to complete, which is higher than that of per-core thermal control (refer Table I). The reason being that when a ffu unit went into a cooling state, in the per-core thermalcontrol scheme, the entire core to which the ffu belongs shall be in a cooling state. In the per-module scheme, the other modules in the core with lower temperature limits continue to run causing an increased heat generation,

Fig. 11. Number of timing intervals to complete testing of all the 8 logic modules in the 16 cores layer-wise using a per-module thermal control scheme with Thermal Limits LL = 100  C, MUL = 120  C and UL = 140  C, Y -axis: Each time interval is 13.5 s.

J. Low Power Electron. 8, 1–11, 2012

Fig. 12. Number of timing intervals to complete testing of all the 8 logic modules in the 16 cores layer-wise using a per-module thermal control scheme with power-gating and Thermal Limits LL = 100  C, MUL = 120  C and UL = 140  C, Y -axis: Each time interval is 13.5 s.

delaying the cooling profile for ffu. This observation also presents a potential scope to optimize further along three fronts: (a) Need for more granular/balanced scan partitioning along with regionalized thermal sensor placement. In the current implementation, ffu and lsu partitions were together close to half of the overall design, (b) Need to optimize the thermal control heuristic to also consider the neighbouring units/regions to decide on the action (i.e., either divide or gate-off the clock), (c) Potential layout optimizations with thermal-aware floorplan techniques that place such hot modules close to the periphery of the chip so as to enhance their cooling process is also an interesting CAD problem to further explore. Similar thermal profiles may be experienced during functional mode. We also analyzed the impact of per-module thermal control with power-gating as explained in Section 3. Figure 12 shows the results from simulation of such a scheme. The results show a reduction of 8.36% over the non power-gating version (results reported in Table I). The Test Access Time (TAT) is sensitive to both the number of sensors and the thermal limits, based on which test controller operates. A summary of the TAT sensitivity analysis is as follows: 1. Sensitivity analysis versus number of sensors In per-core thermal control scheme, the effective number of sensors is 16, whereas in per-module thermal control, it is 160. Figure 13 shows the Test Access Time (TAT) for per-core versus per-module, and gives TAT in case of per-module with power-gating too. The per-module scheme performs better than per-core scheme for all modules except ffu. This is because, in per-module scheme, neighbouring modules continued to generate heat, which delays the cooling profile of ffu. Per-core testing with power-gating shows an 8.36% reduction over per-core testing without power-gating. 9

Thermal-Safe Dynamic Test Scheduling Method Using On-Chip Temperature Sensors for 3D MPSoCs

Pasumarthi et al.

performed that use the amount of heat produced in different parts of the chip to detect the Trojans.38

6. CONCLUSIONS

Fig. 13. TAT sensitivity analysis versus number of sensors PC: PerCore, PM: Per-Module ffu: floating-point front-end unit

2. Sensitivity analysis versus thresholds to avoid thermal runaway Lower the thresholds, more the TAT, as shown in Figure 14. This is due to the fact that in case of lower threshold limits, more time is required to cool down to the middle or lower limits. Having a fine-grained per-module thermal control enables interesting and important capabilities. One capability is to set lower thermal profiles for modules that are close to thermal-sensitive blocks like on-chip memory. It has been shown in Ref. [37] that both memory and logic junction temperatures may exceed 100  C during functional modes of operation. Such high operating temperatures pose reliability and performance challenges when embedding/integrating memories into the processor/logic stack. With online tests being run between functional operations, it is necessary for the tests to consider the initial temperature of the cores at the end of functional operation (including those adjacent to memories) and also ensure that thermal constraints are honoured at the end of test to move back to the functional operation seamlessly. This would be applicable when including memory BIST tests also as a part of system test. The second capability is to detect hardware trojans, possibly inserted through procured IP cores on SoCs. A side-channel analysis can be

Fig. 14. TAT sensitivity analysis versus thresholds to avoid thermal runaway

10

This paper presented a dynamic frequency-scaling based test scheduling scheme for 3D chip multi processor that used on-chip thermal sensors to ensure thermal-safety while shifting test vectors and capturing responses. The paper showed that dynamic test scheduling is unavoidable while performing system/online tests of large designs. A 128 million gate, 4-layer, 16-core, OpenSparc S1 processor subsystem was used for this purpose. The paper illustrated through experimentation a trade-off between temperature profile and test turn-around time. An important inference that was derived from the experimental results presented in this paper was that providing permodule fine-grained thermal control can lead to improved test throughput. In addition, it can also enable important control and monitoring capabilities that can enhance the reliability and security aspects of the chip.

References 1. S. Borkar, Design challenges of technology scaling. Micro, IEEE 19, 23 (1999). 2. B. Black, M. Annavaram, N. Brekelbaum, J. DeVale, L. Jiang, G. H. Loh, D. McCaule, P. Morrow, D. W. Nelson, D. Pantuso, P. Reed, J. Rupley, S. Shankar, J. Shen, and C. Webb, Die stacking (3D) microarchitecture. International Symposium on Microarchitecture, IEEE/ACM (2006), pp. 469–479. 3. M. M. Sabry, A. K. Coskun, D. Atienza, T. S. Rosing, and T. Brunschwiler, Energy-efficient multiobjective thermal control for liquid cooled 3D stacked architectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 30, 1883 (2011). 4. B. Pouya and A. L. Crouch, Optimization trade-offs for vector volume and test power. International Test Conference, IEEE (2000), p. 873. 5. K. Skadron, M. R. Stan, K. Sankaranarayanan, W. Huang, S. Velusamy, and D. Tarjan, Temperature-aware microarchitecture: Modeling and implementation. ACM Transactions on Architecture and Code Optimization 1, 94 (2004). 6. Z. He, Z. Peng, and P. Eles, A heuristic for thermal-safe SoC test scheduling. International Test Conference, IEEE (2007), pp. 1–10. 7. P. Girard, Survey of low-power testing of VLSI circuits. IEEE Design and Test of Computers 19, 80 (2002). 8. V. R. Devanathan, C. P. Ravikumar, and V. Kamakoti, Stochastic pattern generation and optimization framework for variation-tolerant, power-safe scan test. International Test Conference, IEEE (2007), pp. 1–10. 9. J. Choi, C. Y. Cher, H. Franke, H. Hamann, A. Weger, and P. Bose, Thermal-aware task scheduling at the system software level. International Symposium on Low Power Electronics and Design, IEEE/ACM (2007), pp. 213–218. 10. P. M. Rosinger, B. M. Al-Hashimi, and K. Chakrabarty, Rapid generation of thermal-safe test schedules. CoRR abs/0710.4797 (2007). 11. M. Cho and D. Z. Pan, PEAKASO: Peak-temperature aware scanvector optimization, VLSI Test Symposium, IEEE (2006), pp. 52–57. 12. W. Huang, S. Ghosh, S. Velusamy, K. Sankaranarayanan, K. Skadron, and M. Stan, HotSpot: A compact thermal modeling methodology for early-stage VLSI design. IEEE Transactions on Very Large Scale Integration VLSI Systems 14, 501 (2006).

J. Low Power Electron. 8, 1–11, 2012

Pasumarthi et al.

Thermal-Safe Dynamic Test Scheduling Method Using On-Chip Temperature Sensors for 3D MPSoCs

13. C. Liu, K. Veeraraghavan, and V. Iyengar, Thermal-aware test scheduling and hot spot temperature minimization for core-based systems. International Symposium on Defect and Fault Tolerance in VLSI Systems, IEEE (2005), pp. 552–560. 14. Z. He, Z. Peng, and P. Eles, Simulation-driven thermal-safe test time minimization for System-on-Chip, Asian Test Symposium, IEEE (2008), pp. 283–288. 15. D. R. Bild, S. Misra, T. Chantem, P. Kumar, R. P. Dick, X. S. Hu, L. Shang, and A. N. Choudhary, Temperature-aware test scheduling for multiprocessor Systems-on-Chip. International Conference on Computer-Aided Design, IEEE/ACM, 2008 (2008), pp. 59–66. 16. E. Tafaj, P. Rosinger, B. Al-Hashimi, and K. Chakrabarty, Improving thermal-safe test scheduling for core-based Systems-on-Chip using shift frequency scaling. International Symposium on Defect and Fault Tolerance in VLSI Systems, IEEE (2005), pp. 544–551. 17. C. Yao, K. K. Saluja, and P. Ramanathan, Partition based SoC test scheduling with thermal and power constraints under deep submicron technologies. Asian Test Symposium, IEEE (2009), pp. 281–286. 18. A. K. Coskun, T. S. Rosing, and K. Whisnant, Temperature aware task scheduling in MPSoCs. Design, Automation and Test in Europe, IEEE/ACM (2007), pp. 1659–1664. 19. C. Yao, K. Saluja, and P. Ramanathan, Thermal-aware test scheduling using on-chip temperature sensors. International Conference on VLSI Design, IEEE (2011), pp. 376–381. 20. A. Dutta, S. Alampally, V. Prasanth, and R. Parekhji, DFT implementations for striking the right balance between test cost and test quality for automotive SoCs. International Test Conference, IEEE (2008), pp. 1–10. 21. A. Gattiker, Invited paper: Yin and Yang of embedded sensors for postscaling-era. VLSI Test Symposium, IEEE (2011), pp. 324–327. 22. Modular TestKompress, Mentor Graphics Tessent TestKompress User Guidev9.0, June (2010). 23. A. Jain, S. Subramanian, R. A. Parekhji, and S. Ravi, Multi-codec configurations for low power and high quality scan test. International Conference on VLSI Design, IEEE (2011), pp. 370–375. 24. V. R. Devanathan, C. P. Ravikumar, and V. Kamakoti, Reducing SoC test time and test power in hierarchical scan test: Scan architecture and algorithms, International Conference on VLSI Design, IEEE (2007), pp. 351–356. 25. A. Dutta, M. Shah, G. Swathi, and R. A. Parekhji, Design techniques and tradeoffs in implementing non-destructive field test using

26.

27.

28. 29. 30. 31.

32.

33.

34.

35.

36. 37.

38.

logic BIST self-test. International Online Testing Symposium, IEEE (2009), pp. 237–242. J. Saxena, K. Butler, and L. Whetsel, An analysis of power reduction techniques in scan testing. International Test Conference, IEEE (2001), pp. 670–677. J. Lee and N. S. Kim, Analyzing potential throughput improvement of power and thermal-constrained multicore processors by exploiting DVFS and PCPG. IEEE Transactions on Very Large Scale Integration VLSI Systems 20, 225 (2012). OpenSparc S1, URL: http://www.opencores.org. OpenSparc T1, URL: http://www.opensparc.net. Master Wishbone Interface, www.opencores.org/opencores, Wishbone. V. R. Devanathan, A. Hales, S. Kale, and D. Sonkar, Towards effective and compression-friendly test of memory interface logic. International Test Conference, IEEE (2010), pp. 124–133. J. Rajski, J. Tyszer, M. Kassab, and N. Mukherjee, Embedded deterministic test. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 23, 776 (2004). S. Alam, R. Jones, S. Pozder, and A. Jain, Die/Wafer stacking with reciprocal design symmetry (RDS) for mask reuse in threedimensional (3D) integration technology, International Symposium on Quality Electronic Design, IEEE (2009), pp. 569–575. D. C. Juan, S. Garg, and D. Marculescu, Statistical thermal evaluation and mitigation techniques for 3D chip-multiprocessors in the presence of process variations. Design Automation and Test in Europe, IEEE/ACM (2011), pp. 383–388. P. Shivakumar and N. P. Jouppi, CACTI 3.0: An integrated cache timing, power, and area model, Western Research Lab Research Report (2001). K. Roy and K. S. Yeo, Low-Voltage, Low-Power VLSI Subsystems. McGraw Hill, 1st edn. (2004), pp. 11–12. D. Milojevic, H. Oprins, J. Ryckaert, P. Marchal, and G. V. der Plas, RAM-on-logic stack—Calibrated thermal and mechanical models integrated into pathfinding flow, Custom Integrated Circuits Conference, IEEE (2011), pp. 1–4. M. Beaumont, B. Hopkins, and T. Newby, Hardware trojans— Prevention, detection, countermeasures (a literature review), Technical report: DSTO-TN-1012, Defence Science and Technology Organization, Australia (2011).

Rama Kumar Pasumarthi Rama Kumar Pasumarthi received his B.Tech. and M.Tech. degree in Computer Science and Engineering from Indian Institute of Technology, Madras, India in 2012. He is current working as a Software Engineer in IBM India Research Labs, Bangalore, India.

V. R. Devanathan V. R. Devanathan is a lead engineer in the ASIC division of Texas Instruments India. He received B.E. in Computer Science from Bharathiar University, India. He received his M.Tech. and Ph.D. degrees in Computer Science and Engineering from IIT Madras, India.

V. Visvanathan V. Visvanathan received the B.Tech. degree in Electrical Engineering from the Indian Institute of Technology Delhi, New Delhi, India, the M.S.E.E. degree from the University of Notre Dame, Notre Dame, IN, and the Ph.D. degree in Electrical Engineering and Computer Sciences from the University of California, Berkeley. Since 2001, he has been with Texas Instruments, Bangalore, where he is currently a Distinguished Technical Staff Member and the Chief Technologist, Application Specific Integrated Circuit India.

Seetal Potluri Seetal Potluri is a Ph.D. student in Electrical Engineering from the Indian Institute of Technology Madras. His areas of specialization include Low Power Digital VLSI Design and Test.

V. Kamakoti V. Kamakoti is a Professor in the Department of Computer Science and Engineering at the Indian Institute of Technology Madras. His areas of specialization include Computer Architecture, VLSI Design and Test and High-Performance Computing. J. Low Power Electron. 8, 1–11, 2012

11

Suggest Documents