Beehive a Multi-core Platform for Beehive, a Multi core ... - MIT CSAIL

0 downloads 98 Views 797KB Size Report
Jan 3, 2010 - N b t dd i ( t f). – No byte addressing (sort-of). • Per-core 1K-word D-cache. I-cache. D-cache. Per c
Beehive, a Multi Beehive Multi-core core Platform for Low-level Systems Research

Andrew d Birrell, ll Tom Rodeheffer, d h ff Chuck Thacker Microsoft Research, Silicon Valley

Ingredients • Xilinx field-programmable gate array (FPGA) • XUPV5 development board ($750 academic) – Virtex-5 XC5VLX110T: • 17280 x 4 flip-flops • 17280 1 280 x 4 L LUTs T • 148 x 4KByte BRAMs – Gbit Ethernet – 2 GB DRAM

• Or O BEE3 board b d with ith more and d bigger bi FPGA FPGAs January 2010

Beehive, a Multi-core Platform ...

2

Results so far • Multi-core RISC (entirely home-grown) • 100 MHz, one cycle per instruction (if cached) • 13 cores per FPGA (using only half the FPGA) • C compiler (and most of an MSIL one) • Completely C l t l malleable ll bl architecture hit t

January 2010

Beehive, a Multi-core Platform ...

3

Goals • Have a mutable computer architecture

– M Make k changes h in i h hours or days, d nott months th – More realistically than any simulation

• Design blank-sheet system software – Without an existing OS – In high-level h h l l languages l (like (l k C#)

• Do both: systems research  architecture • Usable by y universities January 2010

Beehive, a Multi-core Platform ...

4

Research Examples (Hypothetical) • Forget shared memory, use message passing • Transactional memory …

– Apples Apples-to-apples to apples comparison with monitors/CVs

• Do we really y need … – – – – –

Coherent shared memory? Interrupts? Virtual memory? Kernel mode? An operating system?

January 2010

Beehive, a Multi-core Platform ...

5

Per-core CPU rw i t [31 27] instx[31:27]

ra

instx[14:10]

rb

pcx

w

wa

+1 pcInc

Registers 32 X 32

doJump pcMux

ax a

pcd1

bx

Instruction Cache 1K X 32

{inst[31:4], 4'b0} fwdA

lli

fwdB

instx

a

b

link

pc

inst

Read Queue

inst

Data from memory

Constant

ra overrides

Const

Address Queue

amux Function left, shift, arith

bmux Address to memory

Add, Sub, Logic Shift outx

Write Queue

out

Data to memory out

January 2010

Beehive, a Multi-core Platform ...

6

Instructions • 32-bit word (I and D) • 32 x 32 32-bit bi registers i Ra 31

Rw 27 26

Count 22 21

Rb 17 16

??

Function 10

9

8

6

5

4 3

Op 0

• Rw = (Ra Function Rb) Op Count

– Function = add, subtract, and, or, xor, etc – Op p = shift left, shift right, g cyclic y shift, etc.

• Variations for jumps and memory access January 2010

Beehive, a Multi-core Platform ...

7

Memory • 32-bit word, word addressed – No N byte b t addressing dd i ((sort-of) t f)

I-cache

• Per Per-core core 1K 1K-word word D D-cache cache • Per-core 1K-word I-cache

– 8-word lines, direct mapped pp

D-cache

CPU core

• Half of address space is I/O space

January 2010

Beehive, a Multi-core Platform ...

8

Memory Instructions • Separate addressing and data access – Three Th per-core queues: AQ, AQ RQ, RQ WQ

• Read:

AQR r1 r2 0 … LD r4 4 RQ 0

• Write:

AQW r1 r2 0 LD WQ r4 0

January 2010

// Writes r1 but also appends to AQR // Then some time later … // Moves M memory data d from f RQ to r4 4

// Writes r1 but also appends to AQW // Appends data from r4 to WQ Beehive, a Multi-core Platform ...

9

Architectural Curiosities • No memory coherence • No interrupts • No memory protection • No virtual memory • No N kernel k l mode d • No memory bus or I/O bus … January 2010

Beehive, a Multi-core Platform ...

10

Interconnect: the Ring • Minimize wires • Passes through:

– Each core – Ethernet controller – DRAM controller

• “train”: token + contents

– Node can modify or append – Buffered in DRAM controller node

• Used for real memory and inter-core messages January 2010

Beehive, a Multi-core Platform ...

11

Inter-core Messages • Sent by core #x, destination core #y • Up to 60 words • Doesn’t use the memory system • No wake-ups: each core polls its message queue • Accessed A d by b placing l i an I/O address dd on AQ

January 2010

Beehive, a Multi-core Platform ...

12

Inter-core Binary Semaphores • Abstractly, each is a value in [0..1] s.t.

– “P” = atomic( t i ( if > 0 then th d decrementt else l fail) f il) – “V” = atomic( if == 0 then increment else nothing)

• Implemented by 64-bit register per core

– Semaphore’s “value” is sum of its value in registers

• Uses ring messages, not the memory system • Polling, not wake-ups

– E.g. g “acquire q mutex” spins p until “P” succeeds.

January 2010

Beehive, a Multi-core Platform ...

13

Using the Ring: Ethernet • Ethernet is just a specialized core, #14

– With an Eth Ethernett attached, tt h d and d no reall memory

• Uses DMA (via ring) for packet contents • Controlled by y messages:

– Core #n -> core #14: use buffers from address X – Core #n -> core #14: send packet at address Y – Core #14 -> core #n: packet arrived, arrived at address Z

• One MAC address p per core

– Broadcast/multicast not fully supported

January 2010

Beehive, a Multi-core Platform ...

14

Software Tools • GCC

– C tto assembler bl

• Basm

– Custom assembly language to relocatable binary

• Bld

– Link binaries into final bootfile image

• Hard-wired bootstrap code

– Use TFTP to load bootfile into DRAM

January 2010

Beehive, a Multi-core Platform ...

15

C Support Libraries • Parts of libc: malloc, free, printf

– “malloc” “ ll ” uses separate t cache h lines li f for each h item it – “int i CACHELINE” for global in separate cache line

• Single-core libraries:

– thread/mutex/CV – IP, P ARP, RP UDP, DP DHCP, DHCP DNS, DN TCP

• Basic multi multi-core core support in “libmc”: libmc :

– Library implements “main”; application provides: • “mcinit” called first, “mcmain” called per-core – Library provides inter-core putchar, malloc, free

January 2010

Beehive, a Multi-core Platform ...

16

Programming in C# and its friends • Strategy: just compile MSIL (bytecodes)

– P Parse via i reflection fl ti – Covers numerous languages with no extra effort

• Tactic: MSIL to C • Status

– First 90% works – Working on the rest (particularly exceptions)

• By-product: y p portable p MSIL-to-C compiler p January 2010

Beehive, a Multi-core Platform ...

17

Debugging (work-in-progress) • Remote debug with GDB via TCP over Ethernet • Dedicate core #1 for teledebug nub

– Dedicated code, code with no bugs  – High-level debugging down to the bare metal

• “debug unit” at each core #n: – – – – –

Breakpoint stops core #n, sends message to core #1 Or “kiss Or, kiss of death death” message from #1 stops core #n Message from #1 to core #n resumes execution Details: cache flushing, RQ, WQ, condition codes Debug unit is working today (used by libmc)

January 2010

Beehive, a Multi-core Platform ...

18

Creeping Kernelism • Special hardware state after breakpoints • Time-slice breakpoints triggered by core #1? • Breakpoints look a lot like slow interrupts? • Base-limit registers for multi-image programs? • Etc. • But still: adding features only if we need them January 2010

Beehive, a Multi-core Platform ...

19

What’s Next • Barrelfish port (MSR Cambridge and ETH) • Try it out on some students (MIT) • Implement transactional memory

– Not “software transactional memory” y 

• Use monitors instead of lock to aid coherence? • Give it away to more universities January 2010

Beehive, a Multi-core Platform ...

20