Document not found! Please try again

Customizing 16-bit floating point instructions on a NIOS II ... - LRI

0 downloads 0 Views 186KB Size Report
Abstract. We have implemented customized SIMD 16-bit floating point instructions on a NIOS II processor. On several image processing and media benchmarks ...
Customizing 16-bit floating point instructions on a NIOS II processor for FPGA image and media processing Daniel Etiemble *, Samir Bouaziz ** and Lionel Lacassagne** *LRI, ** IEF, University of Paris Sud 91405 Orsay, France {[email protected], [email protected] ; [email protected] } Abstract We have implemented customized SIMD 16-bit floating point instructions on a NIOS II processor. On several image processing and media benchmarks for which the accuracy and dynamic range of this format is sufficient, a speed-up ranging from 1.5 to more than 2 is obtained versus the integer implementation. The hardware overhead remains limited and is compatible with the capacities of today FPGAs.

1. Introduction Graphics and media applications have become the dominant ones for general purpose or embedded microprocessors. While some applications need the dynamic range and accuracy of 32-bit FP numbers, a general trend is to replace FP by integer computations for better performance in embedded applications for which hardware resources are limited. In this paper, we show that 16-bit FP computations can produce a significant performance advantage over integer ones for significant image processing benchmarks using FPGA with soft core processors while limiting the hardware overhead. By customizing SIMD 16-bit instructions, we significantly improve performance over integer computations without needing the hardware cost of 32-bit FP operators.

1.1 16-bit floating point formats 16-bit floating formats have been defined for some DSP processors, but rarely used. Recently, a 16-bit floating point format has been introduced in the OpenEXP format [1] and in the Cg language [2] defined by NVIDIA. This format, called “half”, is presented in Figure 1. 1 S

5 Exponent

10 Fraction

Figure 1: NVIDIA “half” format A number is interpreted exactly as in the other IEEE FP formats. The exponent is biased with an excess value of 15. Value 0 is reserved for the representation of 0 (Fraction =0) and of the denormalized numbers (Fraction ≠ 0). Value 31 is reserved for representing infinite (Fraction = 0) and NaN (Fraction ≠ 0). For 0