Multicore Organization

15 downloads 62 Views 803KB Size Report
Multicore Processors. Raul Queiroz Feitosa. Parts of these slides are from the support material provided by W. Stallings. Page 2. Multicore Computers. 2.
Multicore Processors Raul Queiroz Feitosa Parts of these slides are from the support material provided by W. Stallings

Objective

Objective

“This chapter provides an overview of multicore systems”. Stallings

Multicore Computers

2

Outline

Outline Hardware Performance Issues  Software Performance Issues  Multicore Organizations  Intel Core Architecture 

Multicore Computers

3

Hardware performance issues Chip Density Microprocessors performance increase in due to Pipelining  Superscalar  Multithreading … 

a)

Improved organization, e.g.,

b)

Increased clock frequency

both made possible by

1.

increasing chip density!

By 2015 → 100 billion transistors on 300mm2 die. Multicore Computers

4

Hardware performance issues Relative Performance

Multicore Computers

5

Hardware performance issues Relative Performance/cycle

Pollack’s rule: performance is roughly proportional to square root of increase in complexity 

2.

Double complexity gives 40% more performance

Diminishing gains with complexity increase! Multicore Computers

6

Hardware performance issues Power

3.

Power requirements grow exponentially with chip density and clock frequency! Multicore Computers

7

Hardware performance issues Increased Complexity

4.

Memory transistors have a power density one order of magnitude lower than that of logic. Multicore Computers

8

Outline

Outline Hardware Performance Issues  Software Performance Issues  Multicore Organizations  Intel Core Architecture 

Multicore Computers

9

Software Performance Issues small amounts of serial code impact performance 

According to Amdahl’s law

Speedup



time to execute time to execute

program program

on a single on N parallel

processor processors

1



1 

f



f N

where f is the fraction of code infinitely parallelizable with no schedule overhead.

Multicore Computers

10

Software Performance Issues Small amounts of serial code impact performance due to Communication, distribution of work and cache coherence overheads percentage of sequential code

Multicore Computers

11

Software Performance Issues More recently software Engineers have developed applications that effectively exploit multiprocessor architecture, e. g., database applications.

5.

New applications exploit multiprocessor architecture! Multicore Computers

12

Outline

Outline Hardware Performance Issues  Software Performance Issues  Multicore Organization  Intel Core Architecture 

Multicore Computers

13

Multicore Organization In view of: 1. 2. 3. 4. 5.

Increasing chip density. Diminishing gains with complexity increase, Power requirements grow exponentially with chip density and clock frequency. Memory transistors have a power density an order of magnitude lower than that of logic. Applications, which exploit multiprocessor architecture.

what to do with extra transistors made available by the semiconductor industry? Multicore Computers

14

Multicore Organization What to do with extra transistors made available by the semiconductor industry?  

Reduce complexity, so that multiple complete processors fit in a single chip Reduce clock frequency and increase the proportion of chip occupied by cache to reduce power requirements

Multicore Computers

15

Multicore Organization Main variable in a multicore organization:  Number

of core processors on chip  Number of levels of cache on chip  Amount of shared cache

Multicore Computers

16

Multicore Organization Alternatives

AMD Opteron

ARM11 MPCore

Intel Core Duo

Intel Core i7

Multicore Computers

17

Private × shared L2 Cache Advantages of shared L2 Cache 

 

Constructive interference reduces overall miss rate Data shared by multiple cores not replicated at cache level With proper frame replacement algorithms mean amount of shared cache dedicated to each core is dynamic 





Threads with less locality can have more cache

Easy inter-process communication through shared memory Cache coherency confined to L1

Advantages of private L2 Cache 

Dedicated L2 cache gives each core more rapid access

Shared L3 cache may also improve performance Multicore Computers

18

Outline

Outline Hardware Performance Issues  Software Performance Issues  Multicore Organization  Intel Core Architecture 

Multicore Computers

19

Intel Core Architecture Intel Core Duo uses superscalar cores Intel Core i7 uses simultaneous multithreading (SMT)  Scales 

up number of threads supported

4 SMT cores, each supporting 4 threads appears as 16 cores.

Multicore Computers

20

Intel x86 Core Duo Organization

Multicore Computers

21

Intel x86 Core Duo Organization Introduced in 2006 Two x86 superscalar, shared L2 cache Dedicated L1 cache per core implementing MESI protocol Protocol extended to accommodate multiple chips (SMP) Thermal control unit per core   

Manages chip heat dissipation Maximize performance within thermal constraints If temperature of a core exceeds a threshold, clock rate reduced.

Advanced Programmable Interrupt Controlled (APIC)   

Inter-process interrupts between cores Routes I/O interrupts to appropriate core. Each APIC includes a timer, so that OS can interrupt the local core. Multicore Computers

22

Intel x86 Core Duo Organization Power Management Logic   

Monitors thermal conditions and CPU activity Adjusts voltage and power consumption Can switch individual logic subsystems

2MB shared L2 cache   

Dynamic allocation MESI support for L1 caches Extended to support multiple Core Duo in SMP 

L2 data shared between local cores or external

Bus interface Multicore Computers

23

Intel Core i7 Organization Core 1

32 KB I&D L1 Caches

Core n

● ● ●

32 KB I&D L1 Caches

256 KB 256 KB L2 Caches L2 Cache

256 KB L2 Caches Up to 15 MB L3 Cache

DDR3 Memory Controllers

QuickPath Interconnect

unboxing Multicore Computers

24

Intel Core i7 Organization Introduced in November 2008, 2nd Generation in January 2011 Up to six x86 SMT processors Dedicated L2, shared L3 cache Speculative pre-fetch for caches On chip DDR3 memory controller  

Three 8 byte channels (192 bits) giving 32GB/s No front side bus Turbo Boost  Clock frequency is incrementally adjusted on demand.

Hardware Virtualization 

A facility that allows multiple operating systems to simultaneously processor resources in a safe and efficient manner

Multicore Computers

25

Intel Core i7 Organization QuickPath Interconnection      

Cache coherent point-to-point link High speed communications between processor chips 6.4G transfers per second, 16 bits per transfer Dedicated bi-directional pairs Total bandwidth 25.6GB/s Intel QPI animated demo

Multicore Computers

26

Intel Multicore Processors Cache Latency

Compare Intel Core Processors General Information about Intel Processors

Multicore Computers

27

Text Book References These topics are covered in  Stallings

- chapter 18

Multicore Computers

28

Multicore Processors

END Multicore Computers

29