MIPS microarchitecture ... Load/store architecture with single address mode ....
Quiz 1. – Fri, 2/4. – Closed book, in-class. – 10%. • Class cancelled on Fri, 4/15.
Last Time… • An ISA can have multiple implementations • (Briefly) CISC vs. RISC
CSE 490/590 Computer Architecture
– CISC: microcoded (macro instructions & microinstructions) – RISC: combinational logic
• MIPS microarchitecture
MIPS Review & Pipelining I
– – – –
Steve Ko
Good RISC example Fixed format instructions Load/store architecture with single address mode Simple branch
• Will continue on this today…
Computer Sciences and Engineering University at Buffalo
CSE 490/590, Spring 2011
CSE 490/590, Spring 2011
MIPS Instruction Formats
Reg-Reg Instructions
Register-Register 31
Register-Register
26 25
Op
2
21 20
Rs1
11 10
16 15
Rs2
6 5
Rd
0
31
Func
26 25
Op
21 20
Rs1
16 15
Rs2
11 10
Rd
6 5
0
Func
Register-Immediate 31
26 25
Op
21 20
Rs1
16 15
Rd
immediate
0
• Op (e.g., 0) encodes that this is a reg-reg instruction • Func encodes the datapath operations (add, sub, etc.) • Rd (Rs1) Func (Rs2) • ADD R1,R2,R3
Branch (Same format as Reg-Imm) 31
26 25
Op
21 20
Rs1
16 15
Rs2/Opx
immediate
0
Jump / Call 31
26 25
– Add – Reg[R1] Regs[R2] + Regs[R3]
0
target
Op
CSE 490/590, Spring 2011
Reg-Imm Instructions 26 25
Op
4
Jump / Call
Register-Immediate 31
CSE 490/590, Spring 2011
3
Jump / Call 21 20
Rs1
16 15
Rd
immediate
0
31
26 25
target
Op
• Rd (Rs1) Op (Imm) • ADDI R1,R2,#3
0
• Rd (Rs1) Op (Imm) • J name
– Add immediate – Regs[R1] Regs[R2] + 3
– Jump – PC6…31 name
• LD R1,30(R2)
• JAL name
– Load word – Regs[R1] Mem[30 + Regs[R2]]
– Jump and link – Regs[R31] PC + 4; PC6…31 name
• BEQZ R4, name – Branch equal zero – If (Regs[R4] == 0) PC name CSE 490/590, Spring 2011
C
5
CSE 490/590, Spring 2011
6
1
Datapath: Reg-Reg ALU Instructions RegWrite 0x4 clk
Add
Implementing MIPS:
inst inst PC
Single-cycle per instruction datapath & control logic
addr
inst
inst
Inst. Memory
clk
we rs1 rs2 rd1 ws wd rd2
ALU
z
GPRs
inst
ALU
Control
OpCode
6 0 CSE 490/590, Spring 2011
7
Datapath: Reg-Imm ALU Instructions
31
5 rs 26 25
5 rt 21 20
5 rd
5 0
16 15
11
PC
addr
inst
inst
Inst. Memory
clk
inst
ALU
31
5 rs
26 25
inst inst
inst
16 immediate
rt ← (rs) op immediate 0
9
opcode
5 rs
5 rt
rs
rt
z
ALU Control
ExtSel
OpCode
CSE 490/590, Spring 2011
ALU
Imm Ext
inst inst
6 0
16 15
Introduce muxes
we rs1 rs2 rd1 ws wd rd2
GPRs
inst
ALU Control
ExtSel
5 rt 2120
addr
Inst. Memory
clk
inst
6 opcode
PC z
Imm Ext
OpCode
8
clk
Add
GPRs
inst
0
RegWrite 0x4
we rs1 rs2 rd1 ws wd rd2
inst
rd ← (rs) func (rt)
Conflicts in Merging Datapath
clk
Add
5
CSE 490/590, Spring 2011
RegWrite 0x4
RegWrite Timing?
6 func
5 rd
5 0
6 func
immediate
rd ← (rs) func (rt) rt ← (rs) op immediate
CSE 490/590, Spring 2011
10
Datapath for Memory Instructions
Datapath for ALU Instructions
Should program and data memory be separate? RegWrite 0x4
PC clk
addr
Harvard style: separate (Aiken and Mark 1 influence) - read-only program memory - read/write data memory
clk
Add
we rs1 rs2 rd1 ws wd rd2
inst
Inst. Memory
ALU
GPRs
z
- Note: Somehow there must be a way to load the program memory
Imm Ext
,
ALU Control
Princeton style: the same (von Neumann’s influence) - single read/write memory for program and data OpCode
6 0 opcode
5 rs
5 rt
rs
rt
RegDst rt / rd
5 rd
ExtSel
5 0
OpSel
6 func
immediate
CSE 490/590, Spring 2011
C
BSrc Reg / Imm
rd ← (rs) func (rt) rt ← (rs) op immediate 11
- Note: A Load or Store instruction requires accessing the memory more than once during its execution CSE 490/590, Spring 2011
12
2
CSE 490/590 Administrivia
Load/Store Instructions:Harvard Datapath RegWrite 0x4
inst
disp
– Projects should be done individually.
clk
• Quiz 1
we addr
ALU
z
GPRs
Inst. Memory
clk
WBSrc ALU / Mem
we rs1 rs2 rd1 ws wd rd2
“base” addr
PC
• Please purchase a BASYS2 board (100K) as soon as possible.
MemWrite
clk Add
– Fri, 2/4 – Closed book, in-class – 10%
rdata
Data Memory
Imm Ext
wdata ALU Control
• Class cancelled on Fri, 4/15 – Will update the schedule
OpCode RegDst
6 opcode 31
5 rs
26 25
ExtSel
5 rt 21 20
OpSel
BSrc
16 displacement 16 15
addressing mode (rs) + displacement 0
rs is the base register rt is the destination of a Load or the source for a Store CSE 490/590, Spring 2011
CSE 490/590, Spring 2011
13
MIPS Control Instructions
Conditional Branches (BEQZ, BNEZ)
Conditional (on GPR) PC-relative branch 6 opcode
5 rs
5
16 offset
PCSrc br
BEQZ, BNEZ
5 rs
5
RegWrite
16
Add Add clk
Unconditional absolute jumps 26 target
addr
PC
J, JAL
we rs1 rs2 rd1 ws wd rd2
inst
Inst. Memory
clk
• PC-relative branches add offset×4 to PC+4 to calculate the target address (offset is in words): ±128 KB range • Absolute jumps append target×4 to PC to calculate the target address: 256 MB range • jump-&-link stores PC+4 into the link register (R31)
clk
z
Imm Ext
ExtSel
OpSel
BSrc
16
Register-Indirect Jump-&-Link (JALR)
RegWrite
MemWrite
PCSrc br rind
WBSrc
pc+4
RegWrite
MemWrite
WBSrc
pc+4 0x4
Add
Add Add
Add
clk
addr
clk
we rs1 rs2 rd1 ws wd rd2
inst
Inst. Memory
we rs1 rs2 rd1 ws wd rd2
clk
we addr
ALU
GPRs
z
Imm Ext
PC clk
rdata
Data Memory
addr
inst
31
Inst. Memory
OpCode RegDst
ExtSel
we addr
ALU z
Imm Ext
wdata
OpSel
clk
GPRs
ALU Control
rdata
Data Memory wdata
ALU Control
BSrc
CSE 490/590, Spring 2011
C
zero?
CSE 490/590, Spring 2011
0x4
clk
rdata
Data Memory wdata
ALU Control
15
Register-Indirect Jumps (JR)
PC
we addr
ALU
GPRs
OpCode RegDst
CSE 490/590, Spring 2011
PCSrc br rind
WBSrc
0x4
JR, JALR
6 opcode
MemWrite
pc+4
Unconditional register-indirect jumps 6 opcode
14
zero?
OpCode RegDst
17
ExtSel
OpSel
BSrc
CSE 490/590, Spring 2011
zero?
18
3
Absolute Jumps (J, JAL) PCSrc br rind jabs pc+4
Harvard-Style Datapath for MIPS
RegWrite
MemWrite
PCSrc br rind jabs pc+4
WBSrc
0x4
RegWrite
MemWrite
0x4 Add
Add Add
Add
clk
PC clk
WBSrc
addr
we rs1 rs2 rd1 ws wd rd2
31
inst
clk
Inst. Memory
we rs1 rs2 rd1 ws wd rd2
clk
we addr
ALU
GPRs
z
Imm Ext
PC clk
rdata
Data Memory
addr
inst
31
Inst. Memory
ExtSel
z
Imm Ext
wdata
OpSel
we addr
ALU
GPRs
ALU Control
OpCode RegDst
clk
rdata
Data Memory wdata
ALU Control
BSrc
zero?
OpCode RegDst
CSE 490/590, Spring 2011
19
ExtSel
OpSel
BSrc
zero?
CSE 490/590, Spring 2011
20
Single-Cycle Hardwired Control: Harvard architecture
We will assume • clock period is sufficiently long for all of the following steps to be “completed”: 1. 2. 3. 4. 5.
Pipelined MIPS
instruction fetch decode and register fetch ALU operation data fetch if required register write-back setup time ⇒
tC > tIFetch + tRFetch + tALU+ tDMem+ tRWB
• At the rising edge of the following clock, the PC, the register file and the memory are updated CSE 490/590, Spring 2011
21
An Ideal Pipeline stage 1
CSE 490/590, Spring 2011
22
Pipelined MIPS stage 2
stage 3
stage 4
To pipeline MIPS: • First build MIPS without pipelining with CPI=1
• All objects go through the same stages • No sharing of resources between any two stages • Propagation delay through all pipeline stages is equal • The scheduling of an object entering the pipeline is not affected by the objects in other stages These conditions generally hold for industrial assembly lines. But can an instruction pipeline satisfy the last condition? CSE 490/590, Spring 2011
C
23
• Next, add pipeline registers to reduce cycle time while maintaining CPI=1
CSE 490/590, Spring 2011
24
4
“Iron Law” of Processor Performance
Pipelined Datapath
Time = Instructions Cycles Time Program Program * Instruction * Cycle
0x4 Add
PC
addr rdata
we rs1 rs2 rd1 ws wd rd2 GPRs
IR
Inst. Memory
ALU
– Instructions per program depends on source code, compiler technology, and ISA
we addr rdata
Data Memory
Imm Ext
– Cycles per instructions (CPI) depends upon the ISA and the microarchitecture
wdata
write -back phase Clock period can be reduced by dividing the execution of an instruction into multiple cycles fetch phase
decode & Reg-fetch phase
execute phase
memory phase
tC > max {tIM, tRF, tALU, tDM, tRW} ( = tDM probably) However, CPI will increase unless instructions are pipelined CSE 490/590, Spring 2011
25
CPI Examples Microcoded machine 7 cycles 5 cycles Inst 1
10 cycles
Inst 2
– Time per cycle depends upon the microarchitecture and the base technology Microarchitecture Microcoded Single-cycle unpipelined Pipelined
CPI >1 1 1
cycle time short long short
CSE 490/590, Spring 2011
26
Technology Assumptions
Time
• A small amount of very fast memory (caches) backed up by a large, slower memory
Inst 3
• Fast ALU (at least for integers)
3 instructions, 22 cycles, CPI=7.33
• Multiported Register files (slower!)
Unpipelined machine Inst 1
Inst 2
Thus, the following timing assumption is reasonable
Inst 3
3 instructions, 3 cycles, CPI=1
tIM ≈ tRF ≈ tALU ≈ tDM ≈ tRW
Pipelined machine Inst 1 Inst 2 Inst 3
3 instructions, 3 cycles, CPI=1
CSE 490/590, Spring 2011
27
A 5-stage pipeline will be the focus of our detailed design - some commercial designs have over 30 pipeline stages to do an integer add! CSE 490/590, Spring 2011
28
Acknowledgements • These slides heavily contain material developed and copyright by – Krste Asanovic (MIT/UCB) – David Patterson (UCB)
• And also by: – – – –
Arvind (MIT) Joel Emer (Intel/MIT) James Hoe (CMU) John Kubiatowicz (UCB)
• MIT material derived from course 6.823 • UCB material derived from course CS252
CSE 490/590, Spring 2011
C
29
5