IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 31, NO. 1, JANUARY 2012
7
Guest Editorial Special Section on PAR-CAD: Parallel CAD Algorithms and CAD for Parallel Architectures/Systems
T
HE advent of multicore architectures and systems has created an impetus for developing computer-aided design (CAD) algorithms and design automation (DA) tools specifically for such platforms. Indeed, most existing CAD flows and DA tools assume a sequential underlying computing paradigm and do not exploit the availability of parallel computing resources. Furthermore, such parallel architectures/systems require a complete revamp of DA tools that need to address scalable solutions for modeling, analysis, and optimization for multicore systems. As a consequence, a great deal of research has recently emerged with an aim of developing parallel CAD algorithms and CAD methodologies for parallel architectures and systems. The goal of this Special Section on Parallel CAD Algorithms and CAD for Parallel Architectures/Systems hosted by the IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS is to disseminate some of the best research ideas in addressing the aforementioned challenges and trends. Released in late 2010, the solicitation for this special section has attracted 20 submissions, of which five manuscripts of excellent quality were selected for publication after a rigorous review process involving more than 35 external reviewers with expertise in the field and 59 submitted reviews. The first paper, entitled “SPICE2: Spatial processors interconnected for concurrent execution for accelerating the SPICE circuit simulator using an FPGA,” by Nachiket Kapre and Andr´e DeHon, demonstrates a parallel, field-programmable gate array (FPGA)-based, heterogeneous architecture customized for accelerating the SPICE simulator. The authors decompose SPICE simulation into three constituent phases: model evaluation, sparse matrix-solve, and iteration control, and customize a spatial architecture for each such phase. The proposed heterogeneous FPGA organization mixes VLIW, dataflow, and streaming architectures into a cohesive, unified design to match the exposed parallel patterns. This approach outperforms conventional processors due to high utilization of statically scheduled resources, low-overhead dataflow scheduling of fine-grained tasks, streaming, and overlapped processing of the control algorithms. The proposed approach is shown to offer significant runtime speedups when comparing a Xilinx Virtex-6LX760 FPGA with an Intel Core i7 965 processor. The second paper, entitled “Neural network-based thermal simulation of integrated circuits on GPUs,” by Arvind Sridhar, Date of current version December 21, 2011. Digital Object Identifier 10.1109/TCAD.2011.2175038
Alessandro Vincenzi, Martino Ruggiero, and David Atienza, proposes a new thermal modeling approach for full-chips that can handle the scalability problem of transient heat flow simulation in large 2-D/3-D multiprocessor integrated circuits (ICs). The main idea put forth by the authors relies on exploiting neural networks (NNs) and the computational power of modern graphics processing units (GPUs) for efficiently solving the problem of determining the actual thermal profile of 2-D or 3-D ICs. This is achieved by parallelizing the computation-intensive task of transient temperature tracking using NNs in a massively parallel GPU implementation. The results on real-life benchmarks (e.g., the UltraSPARC Niagara floorplan) show 35× runtime speedup compared to stateof-the-art IC thermal simulation tools, while keeping the estimation error at less than 1 °C. The third paper, entitled “Exploiting parallelism for improved automation of multidimensional model order reduction,” by Jorge F. Villena and Lu´ıs M. Silveira, addresses the issue of automatically generating reduced order models of very large multidimensional systems arising from modeling of linear circuits and physical processes such as heat conduction. The authors introduce an efficient parallel projection-based model order reduction framework for parameterized linear systems. In particular, an automated multidimensional sample selection procedure, which maximizes effectiveness in the generation of the projection basis for the reduction of parameterized linear systems, is developed. The parallel nature of the algorithm can be also efficiently exploited on both shared and distributed memory architectures. This leads to a highly scalable, automatic, and reliable parallel reduction scheme for tackling challenging models that would be otherwise difficult to address with existing sequential approaches. In the fourth paper, “SimPL: An effective placement algorithm,” by Myung-Chul Kim, Dong-Jin Lee, and Igor L. Markov, a self-contained, flat, quadratic global placer that is simpler than existing placers and easier to integrate into timing-closure flows is presented. The proposed algorithm has several appealing features. It maintains lower-bound and upper-bound placements that converge to a final solution. The upper-bound placement is produced by a novel lookahead legalization algorithm. The SimPL algorithm compares very favorably with several other high-performance placers in terms of runtime and solution quality. Furthermore, the new algorithm is amenable to parallelization. The authors demonstrate their empirical studies with SSE2 instructions and up to eight parallel threads.
c 2011 IEEE 0278-0070/$26.00
8
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 31, NO. 1, JANUARY 2012
Finally, the paper entitled “Accelerating FPGA routing through parallelization and engineering enhancements,” by Marcel Gort and Jason H. Anderson, presents parallelization and heuristic techniques to reduce the runtime of FPGAnegotiated congestion routing. The proposed approach encompasses two heuristic optimizations that provide over 3X speedup versus a sequential baseline, and a parallel technique that assigns sets of design signals to different processor cores for concurrent routing. A geographic partitioning of signals into independent sets is developed to help minimize the communication overhead. When combined, the parallel and heuristic techniques provide over 7× speedup with four cores versus the router in the widely used VPR FPGA placement/routing framework, with no significant impact on circuit speed or wirelength. We would like to thank all reviewers for their valuable contribution in the review process. Without their comments and constructive feedback, it would have been impossible to assemble this high-quality special section. Furthermore,
we would also like to thank the Deputy Editor-in-Chief Vijaykrishnan Narayanan and the Editor-in-Chief Sachin S. Sapatnekar for launching this special section and providing valuable advice, support, and feedback throughout the process. We hope that you enjoy this selection of forward-looking papers and find them inspiring for your own work on parallel CAD algorithms, CAD for parallel architectures and systems, or other related research topics. DIANA MARCULESCU, Guest Editor Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, PA 15213 USA (e-mail:
[email protected]) PENG LI, Guest Editor Department of Electrical and Computer Engineering Texas A&M University College Station, TX 77843 USA (e-mail:
[email protected])
Diana Marculescu (S’94–M’98–SM’09) received the Dipl.Ing. degree in computer science from the Politehnica University of Bucharest, Bucharest, Romania, in 1991, and the Ph.D. degree in computer engineering from the University of Southern California, Los Angeles, in 1998. She is currently a Professor of electrical and computer engineering with the Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA. Her current research interests include sustainable and energy-aware computing, reliability and variability-aware system design, and modeling and analysis of biological systems. Dr. Marculescu was the recipient of the National Science Foundation Faculty Career Award from 2000 to 2004, the ACM-SIGDA Technical Leadership Award in 2003, the Carnegie Institute of Technology George Tallman Ladd Research Award in 2004, and the Best Paper Awards from the IEEE Asia South-Pacific Design Automation Conference in 2005, the IEEE International Conference on Computer Design in 2008, the International Symposium on Quality of Electronic Design in 2009, and the IEEE Transactions on Very Large Scale Integrated Systems in 2011. She was an IEEE-Circuits and Systems Society Distinguished Lecturer from 2004 to 2005 and the Chair of the Association for Computing Machinery (ACM) Special Interest Group on Design Automation from 2005 to 2009. She is a Senior Member of ACM. She has served as the Technical Program Chair of the ACM/IEEE International Workshop on Logic and Synthesis in 2004, the ACM/IEEE International Symposium on Low Power Electronics and Design in 2006, and the General Chair of the same symposia in 2003 and 2007, respectively. She is currently the Technical Program Chair of the IEEE/ACM International Symposium on Networks on Chip in 2012 and is an Associate Editor of the ACM Transactions on Design Automation of Electronics Systems.
Peng Li (S’02–M’04–SM’09) received the B.E. degree in information engineering and the M.E. degree in systems engineering from Xi’an Jiaotong University, Xi’an, China, in 1994 and 1997, respectively, and the Ph.D. degree in electrical and computer engineering from Carnegie Mellon University, Pittsburgh, PA, in 2003. Since August 2004, he has been a Faculty Member with the Department of Electrical and Computer Engineering, Texas A&M University, College Station, where he is currently an Associate Professor. His current research interests include very large scale integrated systems, electronic design automation, aspects of parallel computing, and computational neuroscience and biology. Dr. Li was the recipient of three Design Automation Conference (DAC) Best Paper Awards in 2003, 2008, and 2011, two SRC Inventor Recognition Awards in 2001 and 2004, the MARCO Inventor Recognition Award in 2006, the National Science Foundation CAREER Award in 2008, and the ECE Outstanding Professor Award from Texas A&M University in 2008. He is currently an Associate Editor of the IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems and the IEEE Transactions on Circuits and Systems—II: Express Briefs. He has served on the committees of many international conferences and workshops such as the International Conference on Computer-Aided Design, the Design Automation Conference, the International Symposium on Quality Electronic Design, the International Symposium on Circuits and Systems, and the Selection Committee for the ACM Outstanding Ph.D. Dissertation Award in Electronic Design Automation. He has served as the Technical Program Committee Chair of the ACM International Workshop on Timing Issues in the Specification and Synthesis of Digital Systems in 2009 and the General Chair of the 2010 Workshop.