CSE 490/590 Computer Architecture MIPS Review & Pipelining I ...

31 downloads 394 Views 227KB Size Report
MIPS microarchitecture ... Load/store architecture with single address mode .... Quiz 1. – Fri, 2/4. – Closed book, in-class. – 10%. • Class cancelled on Fri, 4/15.
Last Time… •  An ISA can have multiple implementations •  (Briefly) CISC vs. RISC

CSE 490/590 Computer Architecture

–  CISC: microcoded (macro instructions & microinstructions) –  RISC: combinational logic

•  MIPS microarchitecture

MIPS Review & Pipelining I

–  –  –  – 

Steve Ko

Good RISC example Fixed format instructions Load/store architecture with single address mode Simple branch

•  Will continue on this today…

Computer Sciences and Engineering University at Buffalo

CSE 490/590, Spring 2011

CSE 490/590, Spring 2011

MIPS Instruction Formats

Reg-Reg Instructions

Register-Register 31

Register-Register

26 25

Op

2

21 20

Rs1

11 10

16 15

Rs2

6 5

Rd

0

31

Func

26 25

Op

21 20

Rs1

16 15

Rs2

11 10

Rd

6 5

0

Func

Register-Immediate 31

26 25

Op

21 20

Rs1

16 15

Rd

immediate

0

•  Op (e.g., 0) encodes that this is a reg-reg instruction •  Func encodes the datapath operations (add, sub, etc.) •  Rd  (Rs1) Func (Rs2) •  ADD R1,R2,R3

Branch (Same format as Reg-Imm) 31

26 25

Op

21 20

Rs1

16 15

Rs2/Opx

immediate

0

Jump / Call 31

26 25

–  Add –  Reg[R1]  Regs[R2] + Regs[R3]

0

target

Op

CSE 490/590, Spring 2011

Reg-Imm Instructions 26 25

Op

4

Jump / Call

Register-Immediate 31

CSE 490/590, Spring 2011

3

Jump / Call 21 20

Rs1

16 15

Rd

immediate

0

31

26 25

target

Op

•  Rd  (Rs1) Op (Imm) •  ADDI R1,R2,#3

0

•  Rd  (Rs1) Op (Imm) •  J name

–  Add immediate –  Regs[R1]  Regs[R2] + 3

–  Jump –  PC6…31  name

•  LD R1,30(R2)

•  JAL name

–  Load word –  Regs[R1]  Mem[30 + Regs[R2]]

–  Jump and link –  Regs[R31]  PC + 4; PC6…31  name

•  BEQZ R4, name –  Branch equal zero –  If (Regs[R4] == 0) PC  name CSE 490/590, Spring 2011

C

5

CSE 490/590, Spring 2011

6

1

Datapath: Reg-Reg ALU Instructions RegWrite 0x4 clk

Add

Implementing MIPS:

inst inst PC

Single-cycle per instruction datapath & control logic

addr

inst

inst

Inst. Memory

clk

we rs1 rs2 rd1 ws wd rd2

ALU

z

GPRs

inst

ALU

Control

OpCode

6 0 CSE 490/590, Spring 2011

7

Datapath: Reg-Imm ALU Instructions

31

5 rs 26 25

5 rt 21 20

5 rd

5 0

16 15

11

PC

addr

inst

inst

Inst. Memory

clk

inst

ALU

31

5 rs

26 25

inst inst

inst

16 immediate

rt ← (rs) op immediate 0

9

opcode

5 rs

5 rt

rs

rt

z

ALU Control

ExtSel

OpCode

CSE 490/590, Spring 2011

ALU

Imm Ext

inst inst

6 0

16 15

Introduce muxes

we rs1 rs2 rd1 ws wd rd2

GPRs

inst

ALU Control

ExtSel

5 rt 2120

addr

Inst. Memory

clk

inst

6 opcode

PC z

Imm Ext

OpCode

8

clk

Add

GPRs

inst

0

RegWrite 0x4

we rs1 rs2 rd1 ws wd rd2

inst

rd ← (rs) func (rt)

Conflicts in Merging Datapath

clk

Add

5

CSE 490/590, Spring 2011

RegWrite 0x4

RegWrite Timing?

6 func

5 rd

5 0

6 func

immediate

rd ← (rs) func (rt) rt ← (rs) op immediate

CSE 490/590, Spring 2011

10

Datapath for Memory Instructions

Datapath for ALU Instructions

Should program and data memory be separate? RegWrite 0x4

PC clk

addr

Harvard style: separate (Aiken and Mark 1 influence) - read-only program memory -  read/write data memory

clk

Add

we rs1 rs2 rd1 ws wd rd2

inst

Inst. Memory

ALU

GPRs



z

-  Note: Somehow there must be a way to load the program memory

Imm Ext

,

ALU Control

Princeton style: the same (von Neumann’s influence) -  single read/write memory for program and data OpCode

6 0 opcode

5 rs

5 rt

rs

rt

RegDst rt / rd

5 rd

ExtSel

5 0

OpSel

6 func

immediate

CSE 490/590, Spring 2011

C

BSrc Reg / Imm

rd ← (rs) func (rt) rt ← (rs) op immediate 11

-  Note: A Load or Store instruction requires accessing the memory more than once during its execution CSE 490/590, Spring 2011

12

2

CSE 490/590 Administrivia

Load/Store Instructions:Harvard Datapath RegWrite 0x4

inst

disp

–  Projects should be done individually.

clk

•  Quiz 1

we addr

ALU

z

GPRs

Inst. Memory

clk

WBSrc ALU / Mem

we rs1 rs2 rd1 ws wd rd2

“base” addr

PC

•  Please purchase a BASYS2 board (100K) as soon as possible.

MemWrite

clk Add

–  Fri, 2/4 –  Closed book, in-class –  10%

rdata

Data Memory

Imm Ext

wdata ALU Control

•  Class cancelled on Fri, 4/15 –  Will update the schedule

OpCode RegDst

6 opcode 31

5 rs

26 25

ExtSel

5 rt 21 20

OpSel

BSrc

16 displacement 16 15

addressing mode (rs) + displacement 0

rs is the base register rt is the destination of a Load or the source for a Store CSE 490/590, Spring 2011

CSE 490/590, Spring 2011

13

MIPS Control Instructions

Conditional Branches (BEQZ, BNEZ)

Conditional (on GPR) PC-relative branch 6 opcode

5 rs

5

16 offset

PCSrc br

BEQZ, BNEZ

5 rs

5

RegWrite

16

Add Add clk

Unconditional absolute jumps 26 target

addr

PC

J, JAL

we rs1 rs2 rd1 ws wd rd2

inst

Inst. Memory

clk

•  PC-relative branches add offset×4 to PC+4 to calculate the target address (offset is in words): ±128 KB range •  Absolute jumps append target×4 to PC to calculate the target address: 256 MB range •  jump-&-link stores PC+4 into the link register (R31)

clk

z

Imm Ext

ExtSel

OpSel

BSrc

16

Register-Indirect Jump-&-Link (JALR)

RegWrite

MemWrite

PCSrc br rind

WBSrc

pc+4

RegWrite

MemWrite

WBSrc

pc+4 0x4

Add

Add Add

Add

clk

addr

clk

we rs1 rs2 rd1 ws wd rd2

inst

Inst. Memory

we rs1 rs2 rd1 ws wd rd2

clk

we addr

ALU

GPRs

z

Imm Ext

PC clk

rdata

Data Memory

addr

inst

31

Inst. Memory

OpCode RegDst

ExtSel

we addr

ALU z

Imm Ext

wdata

OpSel

clk

GPRs

ALU Control

rdata

Data Memory wdata

ALU Control

BSrc

CSE 490/590, Spring 2011

C

zero?

CSE 490/590, Spring 2011

0x4

clk

rdata

Data Memory wdata

ALU Control

15

Register-Indirect Jumps (JR)

PC

we addr

ALU

GPRs

OpCode RegDst

CSE 490/590, Spring 2011

PCSrc br rind

WBSrc

0x4

JR, JALR

6 opcode

MemWrite

pc+4

Unconditional register-indirect jumps 6 opcode

14

zero?

OpCode RegDst

17

ExtSel

OpSel

BSrc

CSE 490/590, Spring 2011

zero?

18

3

Absolute Jumps (J, JAL) PCSrc br rind jabs pc+4

Harvard-Style Datapath for MIPS

RegWrite

MemWrite

PCSrc br rind jabs pc+4

WBSrc

0x4

RegWrite

MemWrite

0x4 Add

Add Add

Add

clk

PC clk

WBSrc

addr

we rs1 rs2 rd1 ws wd rd2

31

inst

clk

Inst. Memory

we rs1 rs2 rd1 ws wd rd2

clk

we addr

ALU

GPRs

z

Imm Ext

PC clk

rdata

Data Memory

addr

inst

31

Inst. Memory

ExtSel

z

Imm Ext

wdata

OpSel

we addr

ALU

GPRs

ALU Control

OpCode RegDst

clk

rdata

Data Memory wdata

ALU Control

BSrc

zero?

OpCode RegDst

CSE 490/590, Spring 2011

19

ExtSel

OpSel

BSrc

zero?

CSE 490/590, Spring 2011

20

Single-Cycle Hardwired Control: Harvard architecture

We will assume •  clock period is sufficiently long for all of the following steps to be “completed”: 1. 2. 3. 4. 5.

Pipelined MIPS

instruction fetch decode and register fetch ALU operation data fetch if required register write-back setup time ⇒

tC > tIFetch + tRFetch + tALU+ tDMem+ tRWB

•  At the rising edge of the following clock, the PC, the register file and the memory are updated CSE 490/590, Spring 2011

21

An Ideal Pipeline stage 1

CSE 490/590, Spring 2011

22

Pipelined MIPS stage 2

stage 3

stage 4

To pipeline MIPS: •  First build MIPS without pipelining with CPI=1

•  All objects go through the same stages •  No sharing of resources between any two stages •  Propagation delay through all pipeline stages is equal •  The scheduling of an object entering the pipeline is not affected by the objects in other stages These conditions generally hold for industrial assembly lines. But can an instruction pipeline satisfy the last condition? CSE 490/590, Spring 2011

C

23

•  Next, add pipeline registers to reduce cycle time while maintaining CPI=1

CSE 490/590, Spring 2011

24

4

“Iron Law” of Processor Performance

Pipelined Datapath

Time = Instructions Cycles Time Program Program * Instruction * Cycle

0x4 Add

PC

addr rdata

we rs1 rs2 rd1 ws wd rd2 GPRs

IR

Inst. Memory

ALU

– Instructions per program depends on source code, compiler technology, and ISA

we addr rdata

Data Memory

Imm Ext

– Cycles per instructions (CPI) depends upon the ISA and the microarchitecture

wdata

write -back phase Clock period can be reduced by dividing the execution of an instruction into multiple cycles fetch phase

decode & Reg-fetch phase

execute phase

memory phase

tC > max {tIM, tRF, tALU, tDM, tRW} ( = tDM probably) However, CPI will increase unless instructions are pipelined CSE 490/590, Spring 2011

25

CPI Examples Microcoded machine 7 cycles 5 cycles Inst 1

10 cycles

Inst 2

– Time per cycle depends upon the microarchitecture and the base technology Microarchitecture Microcoded Single-cycle unpipelined Pipelined

CPI >1 1 1

cycle time short long short

CSE 490/590, Spring 2011

26

Technology Assumptions

Time

•  A small amount of very fast memory (caches) backed up by a large, slower memory

Inst 3

•  Fast ALU (at least for integers)

3 instructions, 22 cycles, CPI=7.33

•  Multiported Register files (slower!)

Unpipelined machine Inst 1

Inst 2

Thus, the following timing assumption is reasonable

Inst 3

3 instructions, 3 cycles, CPI=1

tIM ≈ tRF ≈ tALU ≈ tDM ≈ tRW

Pipelined machine Inst 1 Inst 2 Inst 3

3 instructions, 3 cycles, CPI=1

CSE 490/590, Spring 2011

27

A 5-stage pipeline will be the focus of our detailed design - some commercial designs have over 30 pipeline stages to do an integer add! CSE 490/590, Spring 2011

28

Acknowledgements •  These slides heavily contain material developed and copyright by –  Krste Asanovic (MIT/UCB) –  David Patterson (UCB)

•  And also by: –  –  –  – 

Arvind (MIT) Joel Emer (Intel/MIT) James Hoe (CMU) John Kubiatowicz (UCB)

•  MIT material derived from course 6.823 •  UCB material derived from course CS252

CSE 490/590, Spring 2011

C

29

5