High Performance Computing in the Multi-core Area

2 downloads 0 Views 2MB Size Report
Jul 7, 2007 - Thomas Sterling: Multicore and Heterogeneous is Disruptive. Technology. Tsugio Makimoto: Makimoto's Wave predicts “Customized” for 2007 ...
High Performance Computing in the Multi-core Area

Arndt Bode Technische Universität München

ISPDC 2007 Hagenberg, Austria 07 July 2007

Technology Trends for Petascale Computing Architectures:

Multicore – Accelerators – Special Purpose – Reconfigurable – Memory and Cache – Network – Secondary Storage

Computing and Software: Programming Models – Heterogeneous vs. Homogeneous – Compilers and Threading – Operating Systems – Tools – Adaptivity Applications:

07 July 2007 © Arndt Bode

Scalability

ISPDC 2007 Hagenberg, Austria

1

Trends in Microprocessor Technology  Compute Performance of COTS Microprocessors depends of - Clock Frequency - Computer Organization  Clock Frequency: - Exponential Increase impossible (Power, Cooling, 135 W/Chip) - Partly Solutions: Clock- and Power reduction, Sleep Transistors, ... - New Goal for Optimization: Energy-Efficiency: MIPS, MFLOPS per W  Computer Organization: ILP, Processor-internal Organization fully exploited - Pipelining, Superscalar, VLIW, Wordlength, ..., contra-productive to EnergyEfficiency (Speculation) - Future is Processor/Thread level-Parallelism (Parallelism of Simpler Units) - Multi-Core and Application-specific Processors , Fault Tolerance

07 July 2007 © Arndt Bode

ISPDC 2007 Hagenberg, Austria

2

Energy - Efficiency

P ~ A C V2 f P: A: C: V: F:

07 July 2007 © Arndt Bode

Power Activity Factor (Active Transistors on Chip) Total Capacity Voltage Clock Frequency

ISPDC 2007 Hagenberg, Austria

3

Multi-Core and Multithreading

... P0

P1

...

P2

P7

0

1

4

5

8

9

28

29

2

3

6

7

10

11

30

31

FLC0

FLC1

FLC2

...

...

FLC7

SLC (gemeinsam) SLC (shared) Multicore-Chip

07 July 2007 © Arndt Bode

ISPDC 2007 Hagenberg, Austria

4

Multi-Core Architectures: A Parallel World for Everybody Compiler: Generation of parallel virtual processes virtual processes

VP0

VP1

...

VP2

...

VPk

Virtualization at System Level Load Balancing / Fault Tolerance / Placement Multi-Core Server MMX P0

P1 1

4

5

2

3

6

7

FLC0

...

P2

0

FLC1

8

DSP

9

10 11

FLC2

...

faulty

P7

MMX

P0

28 29

...

30 31

FLC7

P1

DSP

...

P2

0

1

4

5

8

2

3

6

7

10 11

FLC0

SLC (shared)

FLC1

9

FLC2

...

P7 28

29

30

31

FLC7

SLC (shared) Multicore-Chip 0

Multicore-Chip n

Memory

07 July 2007 © Arndt Bode

ISPDC 2007 Hagenberg, Austria

5

07 July 2007 © Arndt Bode

ISPDC 2007 Hagenberg, Austria

6

Processor-Design Options for Petaflop Systems  Mega-Multicore Systems: General Purpose Applications (Itanium, Xeon, Power-7, ... )  Multicore and Attached Accelerators (Cell, ClearSpeed, NVIDIA, ...)  Special Purpose (BlueGene, QCD, POLARIS, ...)  Reconfigurable and Adaptive (FPGA: continuous research)

General Purpose - Programmability - Scalability - Applicability History:

07 July 2007 © Arndt Bode

vs.

Special Purpose - High Performance - Low Power

Special Purpose „volatile“, Programming Interface not Compatible Microprogrammable Devices : Vertical Migration and Bitslice Early Microprocessors : Coprocessors (FP, IO, DSP, ...) Early HPC : Vector Processor Attachments, ArrayProcessors, Associative Processors ISPDC 2007 Hagenberg, Austria

7

Petaflop Processor Options: Bode’s Oracle Two Types of Systems will persist: General Purpose Mega-Multicore (Programmability, Compatibility, Cost) Special Purpose Systems for Large Specific Application Classes (Energy Efficiency) The Systems will be Highly Parallel Thomas Sterling:

Multicore and Heterogeneous is Disruptive Technology

Tsugio Makimoto:

Makimoto’s Wave predicts “Customized” for 2007 – 2017 (Pendulum: 1014 km)

07 July 2007 © Arndt Bode

ISPDC 2007 Hagenberg, Austria

8

07 July 2007 © Arndt Bode

ISPDC 2007 Hagenberg, Austria

9

07 July 2007 © Arndt Bode

ISPDC 2007 Hagenberg, Austria

10

Consequences for Computer Architecture Memory-Bottleneck Today:

Cache-Hierarchy

P

M

Multi-Core:

P

P

P

...

P

Bottleneck intensified

M

07 July 2007 © Arndt Bode

ISPDC 2007 Hagenberg, Austria

11

Cache Organizations Distributed

P0

Shared

Groupwise shared

P1

...

P2

...

P7

0

1

4

5

8

9

28 29

2

3

6

7

10 11

30 31

FLC0

FLC1

FLC2

...

...

FLC7

SLC (shared)

07 July 2007 © Arndt Bode

ISPDC 2007 Hagenberg, Austria

12

Communication Bottleneck classical SMP:

P

1 Interface / Proc. IC

P

M SMP based on Multi-Core P P ... P

P P ...

P

?

n Interfaces / Multi-Core IC (e. g.: n cores 5n Interfaces mesh-structure)

M

07 July 2007 © Arndt Bode

ISPDC 2007 Hagenberg, Austria

13

HLRB II: SGI ALtix 4700 / 9728 Montecito cores

11

07 July 2007 © Arndt Bode

m 22

m

ISPDC 2007 Hagenberg, Austria

14

Blade

DDR2 DIMM

DDR2 DIMM

DDR2 DIMM

DDR2 DIMM

DDR2 DIMM

DDR2 DIMM

Core 1

Itanium2 Itanium2 (Madison) Dual Core Socket

8.5 GB/s

Memory Controller (Shub 2.0)

6

B/s G .4

6 .4 G B/s

NL Port Plane 1

NL Port Plane 2

Core 2 DDR2 DIMM

DDR2 DIMM

DDR2 DIMM

DDR2 DIMM

DDR2 DIMM

DDR2 DIMM

4 GByte per Compute Core 07 July 2007 © Arndt Bode

ISPDC 2007 Hagenberg, Austria

15

Dual Socket Blades - Density Compute Blades Core 1

Itanium2 Dual Core Socket A

DDR2 DIMM

DDR2 DIMM

DDR2 DIMM

DDR2 DIMM

DDR2 DIMM

DDR2 DIMM

Core 2

8.5 GB/s Core 1

Itanium2 Dual Core Socket A Core 2 07 July 2007 © Arndt Bode

Memory Controller (Shub 2.0)

6

B/s G .4

6 .4 G B/s

DDR2 DIMM

DDR2 DIMM

DDR2 DIMM

DDR2 DIMM

DDR2 DIMM

DDR2 DIMM

NL Port Plane 1

NL Port Plane 2

4 GByte per Compute Core ISPDC 2007 Hagenberg, Austria

16

One 256 Socket Partition: ccNUMA Shared Memory Single System 32*6.4Image, GByte/s = Fat 204.8Tree Gbyte/sConnection

32*6.4 GByte/s = 204.8 Gbyte/s 2*0.8 Gbyte/s per Socket pair 32*6.4 GByte/s = 204.8 Gbyte/s 07 July 2007 © Arndt Bode

ISPDC 2007 Hagenberg, Austria

17

2-D Torus Mesh topology. Rank

07 July 2007 © Arndt Bode

ISPDC 2007 Hagenberg, Austria

18

HLRB II Interconnect

Fat Tree Topology

07 July 2007 © Arndt Bode

ISPDC 2007 Hagenberg, Austria

19

Topology

Batch Scheduler must be topology aware PBSpro 8.0 early access Placements of jobs is optimized according to topology

Fat Tree

Topology is stored in placement sets

2D-Torus

07 July 2007 © Arndt Bode

ISPDC 2007 Hagenberg, Austria

20

Petascale Software Key Issue is use of Excess Parallelism:  Programming Model, Language and Compiler • • • •

Conventional Parallelizing (Including JIT) Parallelizing with Directives New Parallel Languages (Transactional Memory, ...)

 Use of Threads • Functional • Speculative • Assist/Helper (Prefetching, Monitoring, Debugging, Tools, Virtualization, Security, FT-Lockstep, ...)

 Scalable Tool Models 07 July 2007 © Arndt Bode

ISPDC 2007 Hagenberg, Austria

21

You tell us!

This is the Academic Forum, so you tell us. What are the solutions? Where are the new language ideas? Can you design a statically checked race-free language which is useful? Can naїve users really use functional languages? Any language which talks about threads is too low level for most users. So how do we raise the language level? Should we be doing message passing inside the node?  It’s the only demonstrated way to achieve high scalability  Do we really need to bring back Occam? James Cownie

07 July 2007 © Arndt Bode



Intel Academic Forum, Budapest, 13.06.2007

ISPDC 2007 Hagenberg, Austria

22

Some Examples of Research at MMI

 Cache-Prefetching with Helper-Core (up to 39 %)  Cache-Behavior with shared/separate Caches  Helper Cores for Fault Tolerance

07 July 2007 © Arndt Bode

ISPDC 2007 Hagenberg, Austria

23

Prefetching Bandwidth obtainable for Matrix-Multiplication (Blocking with small amount of caching)

Matrix Dimension

07 July 2007 © Arndt Bode

ISPDC 2007 Hagenberg, Austria

24

Advantages of Shared Caches Synthetic Benchmark: Latency for continuous Writes onto the same Memory area (2 cores)

07 July 2007 © Arndt Bode

ISPDC 2007 Hagenberg, Austria

25

Lockstepping of Virtual Machines  Synchronized Identical Execution of Applications on 2 Processors  Virtual Environment allows for Monitoring and Control by Second Core  Minimal Performance Loss

07 July 2007 © Arndt Bode

ISPDC 2007 Hagenberg, Austria

26

Lockstepping of Virtual Machines

07 July 2007 © Arndt Bode

ISPDC 2007 Hagenberg, Austria

27

Bavarian Graduate School of Computational Engineering Elitenetzwerk Bayern Welcome to the Bavarian Graduate School

of Computational Engineering Computational Engineering ... refers to all activities in engineering that use computers as their main tool. Typical tasks in Computational Engineering are the solution of differential equations that model certain physical phenomena, the optimization of processes in engineering, or the stochastic simulation of a complex system.

The Bavarian Graduate School ... of Computational Engineering is an association of three Master programs: •Computational Engineering (CE) at the Friedrich-Alexander-Universität Erlangen-Nürnberg, •Computational Mechanics (COME), and •Computational Science and Engineering (CSE) at the Technische Universität München. Our goal is to push forward the rapidly growing field of Computational Engineering by offering high-quality master programs for students who are interested in the field.

The Elitenetzwerk Bayern ... is an initiative of the state of Bavaria to support the education and advancement of highly talented students. With the help of the Elitenetzwerk Bayern, we are able to offer an "elite" degree program for the best students in our master programs. Outstanding performance in one of the three Master's programs will be honoured by the newly formed academic degree of a "Master of Science with Honours". For further information about these elite program, please read on in our section "Infomation for students".

IGSSE BGCE is a partner of IGSSE, TUM's International Graduate School of Science and Engineering 07 July 2007 © Arndt Bode

ISPDC 2007 Hagenberg, Austria

28

Summary

Multicore (and Heterogeneous): Disruptive Technologies Many Choices in System Architecture and Programming Models HPC Systems will be massively parallel: Scalability is Challenge for Application-Algorithms, System Software, Tools and Architectures Interesting Times Ahead!

07 July 2007 © Arndt Bode

ISPDC 2007 Hagenberg, Austria

29