Adaptive Power Profiling for Many-Core HPC Architectures

Recommend Documents

Invasive Manycore Architectures - Semantic Scholar

of the power wall, processors do not scale any more towards higher frequencies. Instead, the major trend goes to the inte- gration of more and more processor ...

Invasive Manycore Architectures - Semantic Scholar

From the application programmer's point of view an invasive program will have to be written with respect to the resources currently available for his application.

Evaluating Energy and Power Profiling Techniques for HPC Workloads

parallel application workloads that run on these HPC resources can be difficult to understand without detailed workload energy consumption and power profiles.

Low-Power Manycore Accelerator for Personalized ... - UMBC

May 20, 2016 - Low power; biomedical; manycore; accelerator; digital signal processing .... This implementation is optimized on the bit-width resolution in ad- dition to .... In order to demonstrate the manycore's effectiveness at targeting ..... d

Trends in HPC architectures and parallel programming

A supercomputer application and software are usually much more long-lived than a hardware ... Next Generation Programmin

Chapter in 'Contemporary HPC Architectures' | FutureSystems

Mauricio Tsugawa, Thomas William, Andrew Younge, and the rest of the. FutureGrid .... Jens Voeckler, Renato J. Figueiredo, Jose Fortes, Kate Keahey, and Ewa ...

Performance Modeling of Emerging HPC Architectures

Threaded Architecture (MTA2) and FPGA-accelerated HPC architectures by incorporating performance attributes specific to these architectures within the MA ...

Data Placement in HPC Architectures with ... - UPCommons

mostly by creating multi-level cache systems. ... frequency and to extract the instruction level parallelism. ..... organization, line-level writes, and wear-leveling.

Self-Adaptive Architectures for Autonomic ... - Semantic Scholar

for autonomic computing [5], [6]. In what follows, we explore the applications of this paradigm to support computational science. B. Conceptual Architectures for ...

Adaptive Execution Techniques for SMT Multiprocessor Architectures

Jun 17, 2005 - presents adaptive execution techniques to find the optimal execution ..... ing sequential execution time in 4P1L mode and parallel execution ...

Adaptive Register File Architectures for Balanced ...

thus more on-chip computing resources is ever increasing. Besides, ... The next level is an Adaptive Register file Computing (ARC) unit that can play the role of a ...

On Event-Triggered Adaptive Architectures

Aug 16, 2016 - and the proposed event-triggered decentralized control architecture ...... Calise, A.J. Hâ adaptive flight control of the generic transport model.

Elements of Self-Adaptive Architectures *

presented, and a tentative taxonomy of elements in self-adaptive ... Their study affects many areas within Software Engineering, to the extent that some.

Power Architectures for Telecommunications - A Review

Energy management and optimisation of equipment. (GEODE) describes a large electrical supply characterized by its very high availability. GEODE architecture ...

Multi-Inverter Architectures for High Efficiency Power

SCUOLA DI DOTTORATO IN INGEGNERIA. DIPARTIMENTO DI SCIENZE UMANE, SOCIALI E DELLA SALUTE. Multi-Inverter Architectures for High. Efficiency ...

Aalborg Universitet Modular Power Architectures for ...

Lin, Hengwei; Liu, Chengxi; Guerrero, Josep M.; Quintero, Juan Carlos Vasquez; Dragicevic, .... architecture allows plug-and-play functionality, which can.

Self-adaptive Evidence Propagation on Manycore ... - Google Sites

network to a cycle-free hypergraph called a junction tree. ..... Comparison with baseline scheduling methods using junct

Branched Nanowire Architectures for Compact Power ...

Ste Z39 18. Adobe .rotesvenao 7.0 .... Adobe Professional 7.0 ..... 9 December 2007 â IEDM Short Course: Emerging Nanotechnology and Nanoelectronics,.

New Architectures for Space Power Systems

the U.S. (DC), high voltage AC (HVAC) and high voltage DC. (HVDC) systems. ... out the requirement for a high speed load dispatch control. Power variations at ...

Low power Decimation Filter Architectures for - emo

ABSTRACT. In this work, recent low-power decimation filter architectures of Sigma-Delta analog-to- digital converters are presented and an improved version is ...

Low Power Memory Architectures for Video ...

computing memory power consumption. The video processing system being considered here assumes a 64 Mbit Rambus DRAM based design providing up to ...

Simulation of Large-Scale HPC Architectures - Christian Engelmann

Different network architectures, such as star, ring, mesh, torus, twisted torus and ...... R. Oldfield, M. Weston, R. Risen, J. Cook, P. Rosenfeld,. E. CooperBalls ...

Simulation of Large-Scale HPC Architectures - Christian Engelmann

performance tools rely on instrumentation and monitoring,. xSim focuses on an ..... 256 to 65,536 virtual MPI processes (VPs) with a fixed. 128 native MPI ...

Simulation of Large-Scale HPC Architectures - Christian Engelmann

properties in a simulated extreme-scale HPC system to further assist in HPC ... parallel discrete event simulation (PDES) that provides an. HPC application ...

Adaptive Power Profiling for Many-Core HPC Architectures

Download PDF

0 downloads 0 Views 726KB Size Report

Comment

Many-Core HPC Architectures. Jaimie Kelley, Christopher Stewart. The Ohio State University. Devesh Tiwari, Saurabh Gupta. Oak Ridge National Laboratory ...

Adaptive Power Profiling for Many-Core HPC Architectures Jaimie Kelley, Christopher Stewart The Ohio State University Devesh Tiwari, Saurabh Gupta Oak Ridge National Laboratory

State-of-the-Art • Schedulers use profiles to assign resources. Machine 1

Workload Binary.exe Profile

Scheduler Machine N

2

State-of-the-Art • Schedulers use profiles to assign resources. Machine 1

• Profiles contain: – CPU usage – Memory usage – Disk usage

Scheduler Machine N Workload Binary.exe

3

State-of-the-Art • Schedulers use profiles to assign resources. Machine 1 Workload

• Profiles inform:

Binary.exe Scheduler

– Assigning only what is needed – Which apps strain datacenter resources

Machine N

4

State-of-the-Art • HPC workloads may use dozens of machines. Machine 1

• Traditional HPC workloads use every available core.

Workload Binary.exe Scheduler Machine N Workload Binary.exe

5

State-of-the-Art • HPC workloads may use dozens of machines. Machine 1

• Traditional HPC workloads use every available core.

Workload Binary.exe Scheduler Machine N Workload Binary.exe

• Using every core can be inefficient in terms of power.

6

Core Scaling Machine i Core 1

Core 2

…

Core N

• On how many cores should a workload execute? 7

Core Scaling Machine i Core 1

Core 2

…

Core N

• Few workloads use every core effectively. – PARSEC achieves 90% of the speedup of 442 cores with only 35 cores. – [H. Esmaeilzadeh et al., ISCA, 2011]

8

Profiling Peak Power With Idle Cores • Peak power is the largest aggregate power draw that lasts at least a tenth of a second. • In an offline setting, workloads are run to completion – Power readings are taken – Multiple runs on each number of active cores – Very slow 9

Profiling Peak Power Online • Common approaches to online profiling: – Use power readings from similar workloads • [C. Delimitrou et al., ASPLOS 2014]

– K% or fixed time (10 minutes) sampling • [H. Nguyen et al., ICAC 2013]

• Accurate profiles are critical to schedulers and capacity planning. – [O. Sarood et al., Supercomputing 2014]

10

Our Contributions • Empirical study of peak power on HPC – 5 observations about the interaction between architecture, workload, and power usage

• Novel approach to speedup peak power profiling with idle cores

11

Experimental Setup: Architectures Values

Cores L2 (KB)

L3 (MB)

Frequency (GHz)

TDP (W)

Size (nm)

Intel i7

4

256

8

3.4

95

32

Intel Sandy Bridge

8

256

20

2.6

115

32

Intel Xeon Phi

61

512

N/A

1.05

225

22

When cores idled, each used low-power modes: • Intel i7: On-Demand CPU governor • Intel Sandy Bridge: P-states • Intel Xeon Phi: P-states and C3 package 12

Experimental Setup: Workloads • 9 High Performance Computing kernels • Each benchmark runs on 2x cores. • For NAS, size B is appropriate for 1 node. Values

Features

CoMD

N-body molecular dynamics, Frequent inter-thread communication

Snap

PARTISN workload used at LANL, Iterative inter-thread communication

MiniFE

Finite-element workload composed from 4 parallelizable kernels

NAS benchmarks

Widely used benchmark suite (CG, EP, FT, IS, LU, MG)

13

Platform & Tools Platform

Features

MPI

Message passing interface, limited support for shared memory

OpenMP (OMP)

Open Multi-Processing provides extensive shared memory support

•

We used thread pinning to ensure which cores were active.

•

MPI & OMP exhibited similar peak power under micro-benchmarks ( 2% our method used a profiling duration less than 60%. 35