REVIEW PAPER International Journal of Recent Trends in Engineering, Vol 2, No. 8, November 2009
A Tutorial and Survey on Thermal-Aware VLSI Design: Tools and Techniques Saurabh Chaudhury1 1
Department of Electrical Engineering NIT Silchar, India email:
[email protected]
failure of the chip. Thus the yield and reliability is greatly affected. Power densities directly translate into heat and the life of an electronic device is also affected by the operating temperature. For every 100C rise in temperature reduces component life by 50%. So it is necessary to keep the device cool. Finally, there is an increased interference between different signal lines in the chip and so timing is greatly affected by the crosstalk noise and by the increased temperature gradients between aggressor and victim nets. It has been reported that a temperature difference of 100C between the aggressor and the victim nets lead to a change in crosstalk-induced noise of the order of 25% [1]. Noise margin also varies due to rise in temperature. This leads to inaccurate estimation of signal delay and clock skew. Moreover, with the rise in chip temperature, the interconnect resistance is likely to increase and typically for every 100C increase in temperature the interconnect delay increases by 5%. RC delay of different clock paths also varies and clock skew induced by the temperature need to be taken care of. Therefore assuming a constant temperature for the analysis of electrical characteristics of the design would be inaccurate. The power densities and packaging characteristics essentially determine the actual distribution of on-chip temperature. Therefore an accurate power estimation, thermal profiling and packaging characteristics are highly essential to design and analyze the system, then only it is possible to apply the appropriate technique in order to have a thermal-aware design. The paper is organized as follows. Section 2 takes some of the reported heat dissipation and thermal models into account for accurate temperature estimation in the chip. Section 3 deals with the thermal-aware design problem at the level of floor planning. Next in Section 4, we will look into the thermal-aware placement problems. Section 5 tells about the problem of routing and clocktree distribution from thermal point of view. Finally a conclusion is given in Section 6 and the challenges that are to be taken by the future VLSI designers.
Abstract--The strongest challenge that a VLSI designer has to face today is the extremely high heat generation within a chip which not only degrades the performance but also the yield and reliability are greatly affected. The situation even became worse with the evolution of multi-core processors with billions of devices in a single chip and due to die-to-die temperature variations within the chip. This paper surveys the state-of-the art design methodologies, tools and techniques to estimate and minimize the heat dissipation within a chip. It emphasizes the thermal problems especially at the layout (including floorplanning, placement, routing, and clock distribution) level. The goal of the paper is to focus on the ongoing research in the field of thermal balanced design for nanometer ICs and to aware the challenges that are to be taken by the future VLSI designers. Index Terms—thermal-aware VLSI, placement, routing, temperature gradient
floorplanning,
I. INTRODUCTION The growing demand for the compact, high performance devices has led to aggressive device scaling. This results in tremendous increase in power per unit area of the chip which eventually dissipates as heat and causes temperature rise in the chip. As scaled devices are expected to switch more frequently and cause more leakage. Uneven power distribution across the chip often results in hotspots (>1000C) with intra die variations of 10-200C. Temperature across the die can be as high as 500C or even more. The power density within chip has already reached an uncontrollable level and is expected to rise up to 100W/cm2 at 50nm technology node. This leads to an unprecedented challenge to today’s designers. Excessive temperature rise in the chip has many consequences. First of all, it increases leakage. Moreover, the temperature and leakage are interdependent. Subthreshold leakage current grows in an exponential manner with temperature and may lead to thermal runaway. Secondly, increase in sub-threshold and gate leakage means more IR drop in the power rails causing degradation in switching delay and so hinders the performance of the chip. Switching delay of a logic circuit can increase up to 30-40% due to a Vdd drop of 10%. MOS current drive capability also decreases approximately by 4% for every 100C increase in temperature. Third, higher power densities in a chip cause electro-migrations which may open or short a circuit making the circuit to behave abnormally leading to
II. THERMAL MODELS As there is a close correlation between power dissipation, temperature, performance and reliability and usually they interact with each other. Thus the key element for a temperature-aware design methodology is a thermal model to estimate operating temperatures. Fig.1 18
© 2009 ACADEMY PUBLISHER
REVIEW PAPER International Journal of Recent Trends in Engineering, Vol 2, No. 8, November 2009 shows how a thermal model can act as a bridge for accurate power, performance, and reliability estimations. The HotSpot tool as proposed by Skadron et al. [2] is a computationally effective and easy to use thermal modelling tool to estimate the thermal effects at block level. It gives a simple compact model to take into account the heat dissipation within each functional block and the heat flow among the blocks. Basic idea of the scheme is that if we know the thermal resistance and power distribution of a given floorplan then we can calculate the temperature of each block, Pj = Rt × Tj. The heat diffusion model is first proposed by Han et al. [4]. According to this model, heat diffusion between any two adjacent blocks is proportional to their temperature difference and length of the shared boundary between them. As power density is directly proportional to temperature, so temperature difference is equivalent to power density difference. Thus heat diffusion between two blocks can be expressed as
have been proposed in this direction, together with the traditional goal of area minimization and cost of routing. Ref. [8] proposes a genetic algorithm based thermalaware floorplanning that aims at reducing hotspots and distributing the temperature uniformly across the chip. It also takes into account the traditional design goal, chip area. Power aware design alone is not able to address the temperature challenges, because the thermal profile depends not only on power density but also on physical size and relative location of each functional block as addressed. Han [4] demonstrated that how different floorplans can affect the maximum temperature of the chip. The temperature difference of different floorplans of the same design can be as high as 300C. The proposed model includes the traditional goal of area and wire length optimization along with the heat diffusion measure as an approximation to temperature and solved the problem using simulated annealing algorithm. Y. Cai [9] proposes a thermal-aware floorplanning algorithm supporting voltage islands for low power SOC design, where not only position but also the supply voltage of each module is determined. A thermal-aware floorplan extended to micro-architectural level has been proposed in [10]. Schaffer and Kim [11] propose a thermal-aware design and hotspot reduction technique from the gatelevel netlist. Ning and Zhonghua [12] present a GaussSeidel method of floorplanning for thermal-aware design where they propose an incremental iterative method to solve the thermal model of hierarchical floorplanning which can avoid hot spots in the design of chip.
H(d1, d2) = (d1-d2) × shared_length Whereas the total heat diffusion to be H(d) = Σ H(d, di). The compact thermal model [2] is an extended version of HotSpot tool [3], which was proposed earlier at the micro-architectural level in [5] and is adopted in [6]. This compact thermal model is a general model and therefore can be applied to different contexts. For example, dynamic thermal management (DTM) is an active research area in computer architecture community. Ref. [7] presents a typical temperature-aware design flow.
IV. THERMAL-AWARE PLACEMENT In addition to conventional goal of reducing the chip area, easy routing and reducing the cost, temperature is yet another constraint/goal while during placement of cells. Future VLSI must have uniform temperature distribution across the chip and maximum temperature rise in the chip should be brought down to a safe limit and should be free from hotspots. A smart temperatureaware placement at the layout level plays a significant rule in achieving all these objectives. In this context, a number of efficient algorithms, techniques and methodologies have been evolved in the past few years and the search for a novel technique is still continued. Li and Miyashita [13] present a thermal-aware placement algorithm for standard cell placement based on FiducciaMattheyses (FM) partition scheme. Siozios and Soudris [14] propose a novel methodology for temperature-aware placement and routing of FPGAs. Hung et al. [15] propose a thermal-aware IP virtualization and placement for network-on-chip architecture. Hardware virtualization that maps logic processing units onto processing elements (PEs), affects the power consumption of each PEs, while, the communication among PEs affects the overall performance and router power consumption, which depends on the placement of PEs. Chen and Sapatnekar [16] propose a scheme to achieve better thermal distribution for partition-driven placement for standard cell designs. The original compact thermal
Fig. 1: Interaction among power, temperature, performance and reliability III. THERMAL-AWARE FLOORPLANNING Traditionally the goal of a good floorplan is to minimize the chip area, make the subsequent routing phase easy and to minimize the wire-length and hence the cost. However with the growing complexity of devices in VLSI chips, the power density and consequent heat dissipation is fast becoming a limiting factor in microprocessor design. Because high heat generated within the chip must be quickly dissipated to the ambient in order to keep it relatively cool. Thus the increasing heat dissipation increases the overall system cost. One way to reduce the system cost is to keep the chip temperature below a certain limit. A good temperatureaware floorplan can give a solution to this thermal problem, as different floorplans give different maximum temperature rise in the chip. Quite a number of schemes 19 © 2009 ACADEMY PUBLISHER
REVIEW PAPER International Journal of Recent Trends in Engineering, Vol 2, No. 8, November 2009 Macii [24] presents a thermal aware clock distribution network and propose that the clock skew induced by temperature gradients is no longer negligible. Moreover, buffer insertion in clock distribution networks have to be revisited to account for temperature effects. Usually, delay in a clock tree is modeled by the Elmore delay model. So with temperature gradients, R is no longer a constant, but it becomes temperature dependent as given by the expression, R(x) = R0( 1+ β T(x) ). Thus, skew is no longer zero and is now dependent on temperature profile of the clock trace. Fig. 2(a) shows the location of clock insertion point under uniform thermal profile for zero clock skew while Fig. 2(b) shows the modified clock insertion point (with zero clock skew) under thermal gradient when thermal profile is linearly increasing toward the wire C. Tsay [25] propose an exact zero-skew clock routing algorithm using Elmore delay model. The technique is essentially a recursive bottom-up algorithm for interconnecting two zero-skewed sub-trees to form a new tree with zero-skew. Cho [26] propose a new temperature aware clock tree optimization algorithm (TACO) which overcomes the drawbacks of previous algorithms and minimizes the worst case clock skew in presence of onchip thermal variations and also minimizes wirelength. They introduced the concept of merging diamond which incorporates an accurate method of thermal simulation (ADI-based [27]) to feedback the thermal impact of the resulting tree from TACO. Liu [28] presents an efficient and effective simultaneous hotspot avoid embedding and thermal aware routing (TMST) method, where hotspot embedding avoid tree topology located in area with high temperature possibility and thermal aware routing reduce skew in tree path with more smooth temperature area. With a thermally tolerable tree structure, the proposed method can not only reduce delay skew but also skew variation and claimed to be much better in performance compared to TACO [26] and PECO [29].
model [2] has been simplified in the proposed scheme for partition-driven placement which allows using temperature in the inner placement loop as a constraint for a better thermal distribution. A thermal aware physical design methodology for 3D ICs has been presented by Cong and Zhang [17] where they have taken temperature as an additional constraint of optimization at every step of physical design including floorplanning, placement and routing. Ref. [18] presents the first physical design algorithms for thermal and power supply noise-aware 3-D placement and crosstalk-aware 3-D global routing. Balakrishnan [19] have taken the placement problem for 3-D ICs from the point of view of thermal-aware and reduced wire congestion. Ref. [20] proposes a procedure of generating a thermal-aware a 3D placement from the existing 2-D placement results by a method of transformation. Jaffari and Anis [21] has given a thermal-aware placement technique for FPGAs to reduce maximum temperature and on-chip temperature gradients. A new cost function has been given for the simulated annealing core of the placement tool which is based on electrostatic charge model instead of extracting thermal profile at each simulation run and claimed to achieve better results with an algorithmic complexity linear with the number of logic blocks. V. TEMPERATURE-AWARE ROUTING As already mentioned today’s chips are subjected to high temperature gradients because of aggressive scaling and adoption of various low power strategies such as dynamic power management, clock-gating, sleep transistor insertion, transistor sizing etc. With ever decreasing feature sizes, the global metal layers on which the clock signal is routed are getting closer to the substrate. So, the temperature gradients in clock distribution networks may be induced due to self heating or thermal coupling from the substrate or metal layers underneath the clock. Thus, routing is one of the most tedious tasks in modern high performance and high density chips. Today’s chip consists of 3-D ICs which are stacked together. Naturally, it gives rise to more and more global routing challenges and also suffers from thermal problems because of high power density and high thermal resistance of the insulating dielectric. Thus a 3-D global routing must be electro-thermally conscious. In order to have a thermal-balanced design and to meet temperature constraints in ICs, Rossello [7] has presented thermalaware design rules for nanometer ICs. A Thermal aware design methodology has been presented in [1]. Zhang [22] propose a temperature-aware 3D routing (TA) technique which can effectively reduce temperature by appropriate allocation of thermal via and thermal wire insertion (applying linear programming) without almost no-peak temperature violation. It can effectively resolve congestion violation and gives a comparable wire length performance with other approaches. Pathak and Lim [23] propose thermal-aware Steiner routing algorithm for interconnections in 3-D ICs.
Figure 2(a): Clock insertion point for uniform thermal profile
Figure 2(b): Clock insertion point under thermal gradient (Courtesy: Enrico Macii, Polytecnico Di Torino)
20 © 2009 ACADEMY PUBLISHER
REVIEW PAPER International Journal of Recent Trends in Engineering, Vol 2, No. 8, November 2009 Microarchitectural Level,” Journal of Instruction-Level Parallelism 8(2005) 1-16. [11] Benjamin Carrion Schafer and Taewhan Kim, “Hotspots Elimination and Temperature Flattening in VLSI Circuits,” IEEE Trans. on Very Large Scale Integration (VLSI) Systems, Vol.16, No.11, November 2008. [12] Xu Ning and Jiang Zhonghua, “Thermal Aware Floorplanning Using Gauss-Seidel Method” Journal of Electronics (CHINA) November 2008, Vol.25 No.6. [13] Jing LI and Hiroshi Miyashita, “Thermal-Aware Placement based on FM Partition Scheme and Force-Directed Heuristic,” IEICE Trans. Fundamental, Vol. E89-A, No. 4, 2006. [14] Kostas Siozios and Dimitrios Soudris, “A Novel Methodology for Temeprature-Aware Placement and Routing of FPGAs,” Proc. ISVLSI 2007. [15] W. Hung, C. Addo-Quaye, T. Theocharides, Y. Xie, N. Vijayakrishnan and M. J. Irwin, “Thermal-Aware IP Virtualization and Placement of Network-on-Chip Architecture,” Proc. ICCD 2004. [16] Guoqiang Chen and Sachin Sapatnekar, “Partition-Driven Standard Cell Thermal Placement,” Proc. ISPD, 2003, California. [17] Jason Cong and Yan Zhang , “Thermal-Aware Physical Design Flow for 3-D ICs,” http://cadlab.cs.ucla.edu/three_d/3dic.html. [18] Jacob Rajkumar Minz, Eric Wong, Mohit Pathak and Sung Kyu Lim, “Placement and Routing for 3-D System-On-Package Designs,” IEEE Trans. Components and Packaging Technologies 2005. [19] Karthik Balakrishnan, Vidit Nanda, Siddharth Easwar, and Sung Kyu Lim, “Wire Congestion And Thermal Aware 3D Global Placement,” Proc. ASP-DAC 2005. [20] Jason Jie Wei and Yan Cong, Guojie Luo, Zhang, “Thermal-Aware 3D IC Placement Via Transformation,” Proc. ASP-DAC 2007. [21] Javid Jaffari and Mohab Anis, “Thermal-Aware Placement for FPGAs using Electrostatic Charge Model,” Proc. ISQED 2007. [22] Tianpei Zhang, Yong Zhan and Sachin S. Sapatnekar, “Temperature-Aware Routing in 3D ICs”, Department of Department of Electrical and Computer Engineering, University of Minnesota. [23] Mohit Pathak and Sung Kyu Lim, “Thermal-aware Steiner Routing for 3D Stacked ICs”, Proc. of CAD 2007. [24] Thermal-Aware Clock Tree Design, Enrico Macii EDA GROUP POLITECNICO DI TORINO. [25] R.-S.Tsay, “An Exact Zero Skew Clock Routing Algorithm,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol.12, pp. 242–249, Feb1993. [26] Minsik Cho, Suhail Ahmed and David Z. Pan, “TACO: Temperature Aware Clock-tree Optimization,” Proc. ICCAD 2005. [27] T. Wangand C.C.Chen,“3D Thermal-ADI: A linear-time chip level transient thermal simulator,” IEEE Trans. On Computer-Aided Design of Integrated Circuits and Systems, vol.21, no.12, Dec2002. [28] Hao Yu, Yu Hu, Chunchen Liu, and Lei He, “Minimal skew clock embedding considering time variant temperature gradient,” Proc. ISPD 2007. [29] ChunChen Liu, Junjie Su and Yiyu Shi, “TemperatureAware Clock Tree Synthesis Considering Spatio-temporal HotSpot Correlations,” Proc. ICCD 2008.
VI. CONCLUSION We can see that thermal-aware VLSI design is of extreme importance today for 2-D and 3-D stacked devices especially when power dissipation, temperature, performance and reliability interact with each other. Ever increasing packing density in the nanometre regime not only asks for suitable means to balance the maximum die temperature and die-to-die temperature variations in the chip. Moreover the maximum temperature rise in the chip should be brought down to a safe limit thereby facilitating lesser cooling cost also. We have seen in this paper a number of efficient techniques to counter the temperature-related issues especially at the layout level. It can be inferred from the papers surveyed here that the temperature-related issues or the thermal problems can be carefully controlled and the chip temperature (or thermal stress) can be brought down to a safe limit if it is well targeted at the physical design level. Although the chip temperature can also be controlled if it is taken at the system or board level or by applying on-chip thermal management strategies, like adaptive thermal management (or DTM), design-time thermal management or package/system-based thermal management but the focus of the paper is limited to physical design only. Hopefully, the paper could able to give a glimpse of the ongoing research on the various issues of thermal balanced design for nanometer ICs and challenges that are to be taken by the future VLSI designers. REFERENCES [1] “Thermally Aware Design Methodology”, Gradient Design Automation, 2005. [2] Wei Huang, Mircea R. Stan, Kevin Skadron, Karthik Sankaranarayanan, Shougata Ghosh, Sivakumar Velusamy, “Compact Thermal Modelling for Temperature-Aware Design” Proceedings of the DAC 2004. [3] http://lava.cs.virginia.edu/hotspot [4] Yongkui Han, Israel Koren and Csaba Andres Moritz, “Temperature Aware Floorplanning,” Proc. Workshop on Temperature-Aware Computer Systems (TACS-2), held in conjunction with ISCA-32, June 2005. [5] K. Skadron, M.R. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan, and D. Tarjan, “Temperature-aware microarchitecture”, Proc. ISCA-30, pages2–13, June 2003. [6] J.Parry, H.Rosten, and G. B. Kromann, “The development of component-level thermal compact models of a C4/CBGA interconnect technology: The Motorola Power PC603 and PowerPC 604 RISC microproceesors,” Components, Packaging, and Manufacturing Technology –PartA, IEEE Transactions on, 21(1):104–112, March 1998. [7] Jose L. Rossello, Sebastia Bota, Marcos Rosales, Ali Keshavarzi and Jaume Segura, “Thermal-Aware Design Rules for Nanometer ICs,” Belgirate, Italy 28-30 September 2005. [8] W-L Hung, Y. Xie, N. Vijaykrishan, C. Addo-Quaye, T. Theocharides, and M.J. Irwin, “Thermal-Aware Floorplanning Using Genetic Algorithms”, Proc. of ISQED 2005.” [9] Yici Cai, Bin Liu, Qiang Zhou and Xianglong Hong, “ A Thermal Aware Floorplanning Algorithm Supporting Voltage Islands for Low Power SOC Design,” PATMOS 2005. [10] K. Sankaranarayan, S. Velusamy, M. Stan and K. Skadron, “ A Case for Thermal-Aware Floorplanning at the
21 © 2009 ACADEMY PUBLISHER