Adaptive Power Profiling for Many-Core HPC Architectures
Recommend Documents
of the power wall, processors do not scale any more towards higher frequencies. Instead, the major trend goes to the inte- gration of more and more processor ...
From the application programmer's point of view an in- vasive program will have to be written with respect to the resources currently available for his application.
parallel application workloads that run on these HPC resources can be difficult to understand without detailed workload energy consumption and power profiles.
May 20, 2016 - Low power; biomedical; manycore; accelerator; digital signal pro- cessing .... This implementation is optimized on the bit-width resolution in ad- dition to .... In order to demonstrate the manycore's effectiveness at targeting ..... d
A supercomputer application and software are usually much more long-lived than a hardware ... Next Generation Programmin
Mauricio Tsugawa, Thomas William, Andrew Younge, and the rest of the. FutureGrid .... Jens Voeckler, Renato J. Figueiredo, Jose Fortes, Kate Keahey, and Ewa ...
Threaded Architecture (MTA2) and FPGA-accelerated HPC architectures by incorporating performance attributes specific to these architectures within the MA ...
mostly by creating multi-level cache systems. ... frequency and to extract the instruction level parallelism. ..... organization, line-level writes, and wear-leveling.
for autonomic computing [5], [6]. In what follows, we explore the applications of this paradigm to support computational science. B. Conceptual Architectures for ...
Jun 17, 2005 - presents adaptive execution techniques to find the optimal execution ..... ing sequential execution time in 4P1L mode and parallel execution ...
thus more on-chip computing resources is ever increasing. Besides, ... The next level is an Adaptive Register file Computing (ARC) unit that can play the role of a ...
Aug 16, 2016 - and the proposed event-triggered decentralized control architecture ...... Calise, A.J. Hâ adaptive flight control of the generic transport model.
presented, and a tentative taxonomy of elements in self-adaptive ... Their study affects many areas within Software Engineering, to the extent that some.
Energy management and optimisation of equipment. (GEODE) describes a large electrical supply characterized by its very high availability. GEODE architecture ...
SCUOLA DI DOTTORATO IN INGEGNERIA. DIPARTIMENTO DI SCIENZE UMANE, SOCIALI E DELLA SALUTE. Multi-Inverter Architectures for High. Efficiency ...
Lin, Hengwei; Liu, Chengxi; Guerrero, Josep M.; Quintero, Juan Carlos Vasquez; Dragicevic, .... architecture allows plug-and-play functionality, which can.
network to a cycle-free hypergraph called a junction tree. ..... Comparison with baseline scheduling methods using junct
Ste Z39 18. Adobe .rotesvenao 7.0 .... Adobe Professional 7.0 ..... 9 December 2007 â IEDM Short Course: Emerging Nanotechnology and Nanoelectronics,.
the U.S. (DC), high voltage AC (HVAC) and high voltage DC. (HVDC) systems. ... out the requirement for a high speed load dispatch control. Power variations at ...
ABSTRACT. In this work, recent low-power decimation filter architectures of Sigma-Delta analog-to- digital converters are presented and an improved version is ...
computing memory power consumption. The video processing system being considered here assumes a 64 Mbit Rambus DRAM based design providing up to ...
Different network ar- chitectures, such as star, ring, mesh, torus, twisted torus and ...... R. Oldfield, M. Weston, R. Risen, J. Cook, P. Rosenfeld,. E. CooperBalls ...
performance tools rely on instrumentation and monitoring,. xSim focuses on an ..... 256 to 65,536 virtual MPI processes (VPs) with a fixed. 128 native MPI ...
properties in a simulated extreme-scale HPC system to further assist in HPC ... parallel discrete event simulation (PDES) that provides an. HPC application ...
Adaptive Power Profiling for Many-Core HPC Architectures
Many-Core HPC Architectures. Jaimie Kelley, Christopher Stewart. The Ohio State University. Devesh Tiwari, Saurabh Gupta. Oak Ridge National Laboratory ...
Adaptive Power Profiling for Many-Core HPC Architectures Jaimie Kelley, Christopher Stewart The Ohio State University Devesh Tiwari, Saurabh Gupta Oak Ridge National Laboratory
State-of-the-Art • Schedulers use profiles to assign resources. Machine 1
Workload Binary.exe Profile
Scheduler Machine N
2
State-of-the-Art • Schedulers use profiles to assign resources. Machine 1
• Profiles contain: – CPU usage – Memory usage – Disk usage
Scheduler Machine N Workload Binary.exe
3
State-of-the-Art • Schedulers use profiles to assign resources. Machine 1 Workload
• Profiles inform:
Binary.exe Scheduler
– Assigning only what is needed – Which apps strain datacenter resources
Machine N
4
State-of-the-Art • HPC workloads may use dozens of machines. Machine 1
• Traditional HPC workloads use every available core.
Workload Binary.exe Scheduler Machine N Workload Binary.exe
5
State-of-the-Art • HPC workloads may use dozens of machines. Machine 1
• Traditional HPC workloads use every available core.
Workload Binary.exe Scheduler Machine N Workload Binary.exe
• Using every core can be inefficient in terms of power.
6
Core Scaling Machine i Core 1
Core 2
…
Core N
• On how many cores should a workload execute? 7
Core Scaling Machine i Core 1
Core 2
…
Core N
• Few workloads use every core effectively. – PARSEC achieves 90% of the speedup of 442 cores with only 35 cores. – [H. Esmaeilzadeh et al., ISCA, 2011]
8
Profiling Peak Power With Idle Cores • Peak power is the largest aggregate power draw that lasts at least a tenth of a second. • In an offline setting, workloads are run to completion – Power readings are taken – Multiple runs on each number of active cores – Very slow 9
Profiling Peak Power Online • Common approaches to online profiling: – Use power readings from similar workloads • [C. Delimitrou et al., ASPLOS 2014]
– K% or fixed time (10 minutes) sampling • [H. Nguyen et al., ICAC 2013]
• Accurate profiles are critical to schedulers and capacity planning. – [O. Sarood et al., Supercomputing 2014]
10
Our Contributions • Empirical study of peak power on HPC – 5 observations about the interaction between architecture, workload, and power usage
• Novel approach to speedup peak power profiling with idle cores
11
Experimental Setup: Architectures Values
Cores L2 (KB)
L3 (MB)
Frequency (GHz)
TDP (W)
Size (nm)
Intel i7
4
256
8
3.4
95
32
Intel Sandy Bridge
8
256
20
2.6
115
32
Intel Xeon Phi
61
512
N/A
1.05
225
22
When cores idled, each used low-power modes: • Intel i7: On-Demand CPU governor • Intel Sandy Bridge: P-states • Intel Xeon Phi: P-states and C3 package 12
Experimental Setup: Workloads • 9 High Performance Computing kernels • Each benchmark runs on 2x cores. • For NAS, size B is appropriate for 1 node. Values
Features
CoMD
N-body molecular dynamics, Frequent inter-thread communication
Snap
PARTISN workload used at LANL, Iterative inter-thread communication
MiniFE
Finite-element workload composed from 4 parallelizable kernels
NAS benchmarks
Widely used benchmark suite (CG, EP, FT, IS, LU, MG)
13
Platform & Tools Platform
Features
MPI
Message passing interface, limited support for shared memory
OpenMP (OMP)
Open Multi-Processing provides extensive shared memory support
•
We used thread pinning to ensure which cores were active.
•
MPI & OMP exhibited similar peak power under micro-benchmarks ( 2% our method used a profiling duration less than 60%. 35