Apr 18, 2010 - National Research Center of the Helmholtz Association www.kit.edu ... User IP. Interface. MicroBlaze/. PowerPC. User IP. Interface. UART. OP. B-B to PC ..... We will call this approach i-Core (related to the latest results in.
Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs Michael Hübner1, Diana Göhringer2, Juanjo Noguera3, Jürgen Becker1 1
Karlsruhe K l h Institute I tit t off T Technology h l (KIT) (KIT), Germany G Fraunhofer IOSB, Germany 3 Xilinx Inc., Dublin 2
Institut für Technik der Informationsverarbeitung (ITIV)
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association
www.kit.edu
Outline Introduction and motivation Related work Concept of Fast Simplex Link (FSL) internal configuration access portt (ICAP) Realization and results C Conclusion l i and d ffuture t work k
2
4/18/2010
Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs
Institut für Technik der Informationsverarbeitung (ITIV)
Introduction and motivation Dynamic and partial reconfiguration: “parts of a configuration can be substituted while other parts stay operative without any disturbance” Spatial and temporal partitioning exploitation to increase performance f and d to t reduce d power consumption ti In a processor based design (MicroBlaze), the configuration access port is one of the “devices” devices on the OPB or PLB bus Why is it not a part of the processor’s processor s microarchitecture? spatial parallelism Temporall T parallelism FPGA 3
4/18/2010
Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs
Institut für Technik der Informationsverarbeitung (ITIV)
Traditional usage g of the ICAP: 1. Dynamic y Reconfiguration g
ICAP was traditionally used for run-time adaptive systems: Loading partial bitreams from external memory memory, transfer to ICAP FPGA User-IP User IP Interface
OPB-B Bus
Module A
Module B
MicroBlaze/ PowerPC
UART
FlashController
Module C Module D
HWIcap
to PC
External Partial Flash Module Memory
31bit for Virtex 5 and Virtex 6 16bit for Spartan 6
Module E 4
4/18/2010
ICAP
Prof. Max Mustermann – Fakultät für Musterwissenschaften: Präsentationstitel
Institut für Technik der Informationsverarbeitung (ITIV)
Traditional usage of the ICAP (cont) : Data transfer through g read- and writeback ICAP was used to transfer data from one BRAM to another Reduction of signal line utilization utilization, novel degree of freedom
Example with 7 encapsulated modules
Test application: pp Slide show on Virtex 2 VGA core has no connection via signal line to PPC
This example: Sander et. Al.: „ Data Reallocation by Exploiting FPGA Configuration Mechanisms”, RAW 2008, April V Very nice i extension: t i Shelbourne et. Al.: “MetaWire: Using FPGA Configuration Circuitry to Emulate a Network-on-Chip“, FPL 2008, September „ Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs
Institut für Technik der Informationsverarbeitung (ITIV)
ICAP is more than a configuration g port… p ICAP can be used in different modes: Access port to the reconfigurable logic, consuming configuration data Access port to the reconfigurable logic logic, producing configuration data (e.g. Readback of configuration data for safety reasons (bit flips etc.) Access port to processing elements already configured on the FPGA, consuming (write mode) data to be processed Access port to processing elements already configured on the FPGA, producing (read mode) data which were processed FPGA User-IP Interface
OPB-Bus
Module A
Module B
MicroBlaze/ PowerPC
UART
FlashController
Module C M d l D Module
to PC
External Flash Memory
In general two modes of operation: 1. for hardware reconfiguration purposes 2. for data transfer purposes
HWIcap p
Module E ICAP
6
4/18/2010
Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs
Institut für Technik der Informationsverarbeitung (ITIV)
Realization alternatives for p processor – ICAP connection
Several approaches exist where a processor triggers the reconfiguration of an accelerator FPGA Memory Controller
PLB or OPB B
Processor (PPC or MicroBlaze)
External Memory e.g. Flash
FPGA (Xilinx Virtex 4)
HW-ICAP Xilinx
ICAP
External Memory e.g. Flash
Other Peripheral Devices
Single Si l b bus version i enables bl DMA ttransfer f for bistreams to the ICAP
7
Dual bus version (newest version) Efficient usage for MicroBlaze
Memory Controller (MPMC)
PLB or XCL
Processor (PPC405 or MicroBlaze)
PLB
Other Peripheral Devices
HW-ICAP Xilinx
ICAP
Numerous previous work work, e e.g.: g: Blodget et. Al.:“A Lightweight Approach for Embedded Reconfiguration of FPGAs“, DATE 2003 Claus et. Al.: „A multi-platform controller allowing for maximum 4/18/2010 Prof. Max Mustermann – Fakultät für Musterwissenschaften: Institut für Technik der Informationsverarbeitung (ITIV) Präsentationstitel dynamic partial reconfiguration throughput“, FPL 2008
Novel exploitation possibilities of the ICAP in adaptive microprocessor p architectures: The i-Core
Lets assume ICAP is integrated into the processor pipeline
ICAP
Extended version based on the picture used by Prof. Lizy Kurian John, Univ. Austin, Texas 8
4/18/2010
Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs
That would mean: • Processor commands are reserved for the ICAP: R Reconfiguration fi i mode: d • ICAP write config. • ICAP read config. D t ttransfer Data f mode: d • ICAP write process data • ICAP read process data • The ICAP is included directly into the data path of the processor lowest delay for data transfer see ICAP from „the software point of view“ and write simple programs for accessing it Institut für Technik der Informationsverarbeitung (ITIV)
Exploitation p of the novel concept p Pipeline with Configuration Port
The novel concept increases the flexibility of a FPGA based processor tremendously ICAP
The ICAP as data sink and source p p can be seen as a multipurpose ALU Legend: -IF: -ID: From the user (programmer) point of view -EX: -MEM: the hardware complexity is hidden through -WB: the p provided libraries Accessible with standard C construction Further hardware abstraction which definitely will increase the acceptance of run-time adaptive hardware
9
4/18/2010
Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs
Instruction fetch Instruction decode Execute Memory access Writeback
Institut für Technik der Informationsverarbeitung (ITIV)
Exploitation p of the novel concept p (cont) ( ) The novel concept enables the run-time adaptation of the processors microarchitecture: Realized instruction (within the ISA) be reconfigured at runtime and realizes therefore a dynamic reconfigurable instruction set processor In g general: An adaptive microarchitecture is p possible: p - Power and energy reduction via pipeline balancing - Using ipc (instruction per cycles) variation reduce power consumption y pp p - Dynamic instruction level p parallelism pipeline adaptation - Adaptive issue queue for reduced power at high performance (Please see in the our paper the references, they did not use this novel approach!) Decentralized processor approach: ICAP connects cores on any position of the chip
Novel quality of processors: The i-Core provides the run-time adaptation of the microarchitecture Pipeline with Configuration Port
An example from a real experiment: adaptation of pipeline from 5 to 3 stages reduction of 90mW power consumption! (Publication under review: ReCoSoC 2010) 10
4/18/2010
Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs
ICAP Legend: -IF: -ID: -EX: für Institut -MEM: -WB:
Instruction fetch Instruction decode Execute Technik der Informationsverarbeitung Memory access Writeback
(ITIV)
(One of the) First steps to the i-Core: FSL-ICAP Hardware view Connecting the ICAP as near as possible to the processor core of MicroBlaze: the FSL connection provides a connection with the latency of only one clock cycle
FPGA (Xilinx Virtex 4) External Memory e.g. Flash
Memory Controller (MPMC)
PLB or XCL
Processor (PPC405 or MicroBlaze) FSL Controller (FSM)
PLB
Other Peripheral P i h l Devices
Other Peripheral Devices
ICAP
11
4/18/2010
Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs
Concept: Transfer the configuration as well as data to be processed by IP cores through the processor. Simple programming model, no hardware knowledge required required.
Approach does not target highest perfromance in data throughput throughput. It focuses to embedd the ICAP into C world. 2-3x 3x speedup But side effect: 2 in comparison to XPS ICAP (through reduction of required clock cycles) y ) Institut für Technik der Informationsverarbeitung (ITIV)
(One of the) First steps to the i-Core: FSL-ICAP Software view /* Function for Bitstream transfer from external memoryy to FSL- ICAP */ 1 int loadBitfromSRAM(Xuint32 baseaddr, Xuint32 size) 2 { 3 Xuint32 conf_word, size4, output_0, i; 4 5 /*point / point to Addr in Ext Ext. Mem *// 6 Xuint32 *pointword = (Xuint32*) baseaddr; 7 8 size4 = size >> 2; // Get size in words 9 output_0 = (size4