Redefis: An SoC Platform for Implementing ApplicationSpecific or User-Custom Logic Kazuaki Murakami Kyushu University
[email protected]
1
Five Ways to Design & Implement Custom Logic (from MPSoC’04) How to Implement General Logic • General-Purpose Processor + Software
GeneralPurpose Processor
Memory
Memory
How to Implement Custom Logic • “From Scratch” Approach – Design new hardware every time • IP-Core-Based Approach – Reuse existing hardware designs, or IP cores • Platform-Based Approaches – Processor + Software – Configurable Processor + Software – Reconfigurable Hardware ÎReconfigurable Processor + Software
Memory 2
Five Ways to Design & Implement Custom Logic (from MPSoC’04) “From Scratch”/ IP-Core-Based Design
Platform-Based Design
Processor + Software Traditional Processor + Software
Configurable Stuff
Configurable Processor + Software
Reconfigurable Stuff Hardware Hardwired Logic
Reconfigurable Processor + Software
Reconfigurable Hardware
FPGA
3
Five SoC Design Flows (from MPSoC’04) HDL/C
Behavior Level
HDL/C
+
C
C
or RT Level
HDL Logic Synthesis
Hardwired Logic
HDL Behavior Synthesis
Hardware Configuration Data
Reconfigurable Hardware
HW Synthesis
Software Processor Configuration Data Reconfigurable Processor
+ Compilation
Compilation
Software
Software
Configurable Processor
Traditional Processor
Platform-Based Design
4
Redefis (Redefinable ISA Processor): A Reconfigurable Processor • Normal Reconfigurable Processor
Algorithm
Algorithm C
HDL Compilation
HW Synthesis HW
SW
Configuration Setting
Program Execution
• Redefis (Redefinable ISA Processor) ISA Gen
C Compilation
ISA HW
SW Program Execution
Configuration Setting
Processor Configuration Data
Processor Configuration Data
Reconfigurable Processor
Reconfigurable Processor 5
Where Does Redefis Go?
Generalpurpose
Traditional Processor
Programmability/ Reconfigurability
Hardwired No
Application -specific
Redefis
FPGA
Yes Processor+Software
Hardware 6
Vulcan: An Implementation of Redefis Vulcan binaryInstruction execution control
Program Memory
64-bit width
Controller
Data Memory
Reconfigurable Datapath (RDP)
code store 128 PE’s (Programmable Elements)
Register Files 256-bit width
Configuration Memory
RDP configuration data for 128 custom instructions
7
Vulcan Chip & Board
8
Reconfigurable Datpath (RDP) BFBC
BFBC
BFBC
BFBC
BOBC
BIBC
BFBC
BFBC
BFBC
BFBC
BOBC
BIBC
BFBC
BFBC
BFBC
BFBC
BOBC
BIBC
BFBC
BFBC
BFBC
BFBC
BOBC
BIBC
BFBC
BFBC
BFBC
BFBC
BOBC
BIBC
BFBC
BFBC
BFBC
BFBC
BOBC
BIBC
BFBC
BFBC
BFBC
BFBC
BOBC
BIBC
BFBC
BFBC
BFBC
BFBC
BOBC
BLSW LHL BIBC BIBC
L V L
BFBC BFBC
BLSW L V L
BLSW BFBC BFBC
L V L
BLSW BFBC BFBC
L V L
BLSW BFBC BFBC
L V L
BOBC BOBC
BIBC
BFBC
BFBC
BFBC
BFBC
BOBC
BIBC
BFBC
BFBC
BFBC
BFBC
BOBC
BIBC
BFBC
BFBC
BFBC
BFBC
BOBC
BIBC
BFBC
BFBC
BFBC
BFBC
BOBC
BIBC
BFBC
BFBC
BFBC
BFBC
BOBC
BIBC
BFBC
BFBC
BFBC
BFBC
BOBC
OUTPUT
INPUT
BIBC
9
Reconfigurable Datpath (RDP) 16行×8列 128 PE’s 128個のPE (16*8)
6-input & 2output LUT (Lookup Table)
PE・・・
PE SW Switches VL,HL間の between スイッチVL and HL’s
PE
PE
PE・・・
PE SW
PE・・・ SW
・・・
SW
PE SW
SW
PE・・・
・・・
VL (64 bits)
SW
PE
・・・
SW
SW
PE
SW
HL (64 bits)
PE SW
SW 10
Instruction Execution Flow Program Memory Controller
Data Memory
Reconfigurable Datapath (RDP)
Register Files
Configuration Memory 11
Instruction Execution Flow - Phase 1: I-Fetch Program Memory
I-Fetch
Controller
Data Memory
Reconfigurable Datapath (RDP)
Register Files
Configuration Memory 12
Instruction Execution Flow - Phase 1: I-Fetch Program Memory
I-Fetch
Controller opcode (configuration memory address)
Data Memory
Branch Data Control (2 operands) Cond. Memory Addr. Reconfigurable Register Datapath Files (RDP)
Register No.
Configuration Memory 13
Instruction Execution Flow - Phase 1: PC Update Program Memory
I-Fetch PC increment
Controller
Data Memory
Reconfigurable Datapath (RDP)
Register Files
Configuration Memory 14
Instruction Execution Flow
- Phase 2: Configuration Data Load Program Memory Controller
Data Memory
Reconfigurable Datapath (RDP) Configuration Memory
Register Files Configuration Data Load
15
Instruction Execution Flow - Phase 2: Data Load Program Memory Data Load
Data Memory
Controller
Reconfigurable Datapath (RDP)
Register Files
Configuration Memory 16
Instruction Execution Flow - Phase 3: Execution Program Memory Execution Data Load
Data Memory
Controller
Reconfigurable Datapath (RDP)
Register Files
Configuration Memory 17
Instruction Execution Flow - Phase 3: Execution Program Memory Execution Data Load
Data Memory
Execution Results
Controller
Reconfigurable Datapath (RDP)
Register Files
Configuration Memory 18
Vulcan Instruction Pipeline Time Phase1 Phase2 Phase3 Phase1 Phase2 Phase3 Phase1 Phase2 Phase3 1 clock cycle
・・・・・・・・・・・・
19
Redefis (Redefinable ISA Processor): A Reconfigurable Processor • Normal Reconfigurable Processor
Algorithm
Algorithm C
HDL Compilation
HW Synthesis HW
SW
Configuration Setting
Program Execution
• Redefis (Redefinable ISA Processor) ISA Gen
C Compilation
ISA HW
SW Program Execution
Configuration Setting
Processor Configuration Data
Processor Configuration Data
Reconfigurable Processor
Reconfigurable Processor 20
Development Tool Chain Functional Requirement Specification Pre-defined Inst. Lib. C Source Code
ISA Generator
Inst. Library (XML or VHDL)
(w/Source Transformer) Transformed C Code
ISA
Retargettable Compiler (Object Generator)
Place and Route
Assembly Code
Result File (Hard-mapped result)
Data
Instruction Creator
Assembler Object code (Exe or Dll)
Vulcan ISS
Result / Profile
Vulcan 21
Demo
22
DES: A Redefis Application
XOR
Cleartext (64 bits)
Key (56 bits)
Initial Permutation(IP)
Create 48-bit Keys
F-Function
16 steps
Cleartext (32 bits) Key (48 bits) XOR
F-Function
F-Function Extension/Permutation XOR
Inverse Initial Permutation(IP-1)
S-BOX
Ciphertext (64 bits)
Permutation
23
DES: A Redefis Application Instruction
What to do
Cleartext (64 bits)
Key (56 bits)
0
Read key and permute it (PC-1)
Instruction 1
Instruction 0
1
Read cleartext and permute it (IP)
Instruction 2 or 3
2
1-bit left rotate shift (LS1)
3
2-bit left rotate shift (LS2)
Instruction 4
4
Permutation (PC-2), Ffunction, and XOR
Loop=16?
5
Inverse initial permutation (IP-1), and output ciphertext
Instruction 5 Ciphertext (64 bits)
• Object code size: 24 instructions • Dynamic instruction count: 35 instructions • Vulcan (6.25MHz) vs. P4 (2.4GHz): – Throughput: 570KB/s vs. 150KB/s
24