2011 21st International Conference on Field Programmable Logic and Applications
Optimizing an Open-Source Processor for FPGAs: A Case Study Lyonel Barthe, Lu´ıs Vit´orio Cargnini, Pascal Benoit, and Lionel Torres LIRMM, UMR 5506, CNRS University of Montpellier 2 161, rue Ada, 34095 Montpellier, France e-mail:
[email protected],
[email protected],
[email protected],
[email protected]
Abstract—Optimizing a processor for FPGA architectures is a challenging task. In this paper, we attempt to bridge the performance gap between commercial and open-source processors by introducing various design and implementation strategies at the register transfer abstraction level, where most optimizations require several design trade-offs to ensure an efficient and proper use of available resources. Using an opensource processor as a case study, we demonstrate the effectiveness of the proposed methods through a set of synthesis and benchmark results. Index Terms—FPGA-optimized soft-core processor, opensource design.
the performance of the cores provided by FPGA vendors [8]. This observation motivates the study of an open-source softcore processor optimized for FPGAs. B. Related Work As part of processor optimizations for FPGAs, the highperformance 32-bit Arithmetical and Logic Unit (ALU) of the Nios 2 has been presented in [9]. While the ALU of ASIC processors is usually constructed with separate units, the one of the Nios 2 was designed to be small and compact in order to exploit the features and the capabilities of Altera’s FPGAs by merging the functionality of several units with multiplexers into the Look-Up-Tables (LUTs). In addition, in [10], a promising high-performance processor for Digital Signal Processing (DSP) applications has been discussed. Its architecture was carefully optimized for Xilinx’s Virtex-4 devices and, like the Nios 2, the ALU of the processor was designed to maximize the utilization of 4-input LUTs. A partial forwarding scheme with or gates was also implemented to reduce the usage of multiplexers. Although these approaches have shown significant area and speed benefits, they suffer from a lack of portability and maintainability, which obviously limit the reuse opportunities.
I. I NTRODUCTION A. Context and Motivation Field Programmable Gate Arrays (FPGAs), once limited to glue-logic applications, have become large, complex, and highperformance System-on-Chip (SoC) platforms that typically contain dedicated multiplier blocks and memory resources for high-speed digital systems. Thanks to their flexibility, softcore processors have been widely adopted in FPGA-based SoC designs to meet performance requirements of embedded applications. A highly optimized soft-core processor for FPGA architectures is consequently essential. Commercial implementations include cores from FPGA vendors such as the MicroBlaze [1] from Xilinx and the Nios II [2] from Altera. As a new player in the market, ARM has introduced the Cortex-M1 [3] processor designed for a variety of common FPGA devices. Among open-source solutions, the OpenRISC 1200 [4] and the Plasma [5] from the OpenCores organisation as well as the LEON3 [6] from Gaisler Research are probably the most well-known projects. Since many years, the free availability of their source code has attracted both scientific and educational communities. More recently, in [7], a MicroBlaze instruction set compliant processor, called the SecretBlaze, was proposed to explore hardware and software defense mechanisms against side-channel cryptanalysis techniques. Compared to other open-source processors, its hardware description was conducted with a high quality design that does not rely on technology-specific libraries. However, most of these open-source projects were not specifically developed to target FPGA architectures and, therefore, it follows that these solutions are not able to match 978-0-7695-4529-5/11 $26.00 © 2011 IEEE DOI 10.1109/FPL.2011.107
C. Contribution and Outline of the Paper In the present work, we rather suggest optimizing a soft-core processor at the Register Transfer Level (RTL) to enhance the quality and the attractiveness of the design. The SecretBlaze, originally developed in our research group, is used throughout this study to evaluate the proposed improvements. The rest of this paper is organised as follows. Section II is focused on the design and the implementation of a FPGA-optimized soft-core processor. To illustrate this point, several architectural enhancements of the SecretBlaze processor are introduced in detail. The evaluation of the resulting design is presented afterwards in Section III. The Dhrystone benchmark is in particular used to measure the integer performance of the new architecture so as to provide a base for comparison. Finally, a conclusion is drawn in Section IV. 551
processes by computing partial products P . Since most FPGAs provide 18-bit x 18-bit signed multiplier blocks, it is therefore adequate to split the multiplication with 16-bit operands. A sign extension must be nonetheless implemented to perform 16-bit unsigned/signed multiply operations with 18-bit x 18-bit signed multiplier blocks. Hence, multiplying A with B can be computed according to equations (1a), (1b), and (1c). Least significant half-words are indicated with a zero, whereas most significant half-words are indicated with a one. Note also that the partial product of most significant half-words as well as most significant halfwords of cross partial products are not required for a standard 32-bit result multiplication.
II. RTL O PTIMIZATIONS FOR FPGA A RCHITECTURES A. A Case Study: The SecretBlaze The SecretBlaze is a highly configurable open-source softcore processor described in VHDL [8]. Based on the instruction set of Xilinx’s MicroBlaze, it provides several optional logical and integer instructions as well as a simplified memory sub-system with customizable data and instruction caches, implementing the pipelined Wishbone protocol for external memory interfaces [7]. In the following sub-sections, we introduce several design and implementation strategies to enhance the performance of a RISC-based processor mapped onto FPGA technology, choosing the SecretBlaze as a case study. Our design goal was to offer an open-source implementation of an embedded processor with a performance level close to solutions developed by FPGA vendors.
(1a) A ∗ B = (A1 A0 ) ∗ (B1 B0 ) = ((A1 ∗ B0 + A0 ∗ B1 )