Logarithmic Arithmetic for Real Data Types and Support for Matlab ...

4 downloads 428 Views 265KB Size Report
Finally, the possibilities to extend this so- lution in order to support the run-time reconfigurations are ..... able online: http://www.mathworks.com/ access/helpdesk/.
Logarithmic Arithmetic for Real Data Types and Support for Matlab/Simulink Based Rapid-FPGA-Prototyping Zdenek Pohl, Jan Schier, Miroslav Licko, Antonin Hermanek, Milan Tichy, Rudolf Matousek, Jiri Kadlec [email protected] 1

2

Institute of Information Theory and Automation Department of Signal Processing Pod vodarenskou vezi 4, Prague 8, Czech Republic

Abstract The paper is focused on the rapid prototyping for FPGA using the high-level environment of MATLAB/Simulink. An approach using the Xilinx System Generator (XSG) is reviewed on an example of the High-Speed Logarithmic Arithmetic (HSLA) unit. An alternative approach using the combination of the Real Time Workshop (RTW) with the Handle-C compiler for automatized generation of the HDL code is presented. Finally, the possibilities to extend this solution in order to support the run-time reconfigurations are outlined.

1

Introduction

Today, the development of the DSP applications requires a multi-level approach: on one hand, we cope with the algorithm and system design. At this level, Matlab/Simulink is used very often. The well-known advantages of this environment are its high-level scripting language with strong support for matrix operations, extensive graphing possibilities, as well as the rich set of toolboxes. With all these features, it is very well suited for rapid prototyping and design of a broad range of engineering applications and thus quite popular in both academia and industry. On the other hand, there is the implementation level: often, the target environment is an embedded system with an FPGA circuit. Then, the designer of the target implementation will use some Hardware Description Language (HDL) like VHDL or Verilog. That means, however, that it is necessary to rewrite the application, which itself is an error-prone process; as a result, we get two versions of the code (in differ-

Department of Control Systems FEE CTU Center for Applied Cybernetics Karlovo nam. 13, Prague 2, Czech Republic

ent languages), which will be difficult to keep up-to-date. It will also be quite difficult to make both the DSP and the implementation team successfully cooperating. There is an increasing number of attempts to solve this problem by either allowing the DSP engineer and the hardware designer to use one programming language (Handle-C from Celoxica, SystemC developed by the Open SystemC Initiative - OSCI) or by an automatic code generation from the Simulink model. In our paper, we focus on this second option. Our approach is based on the Xilinx System Generator for DSP (XSG) and its black-box based methodology for developing extensions. We review this approach on an example of a High-Speed Logarithmic Arithmetic (HSLA) unit, which has been developed in our group. Then we propose a new methodology for developing the XSG extensions (which have to be HDL-coded), using a combination of the Real-Time Workshop (RTW) from Mathworks with a DK1 HandleC toolkit from Celoxica. Finally, the possibilities of hardware-software co-design and co-simulation are pointed out.

2

Xilinx System Generator for DSP

The aim of the System Generator [10] is to allow to design soft Intellectual Property (IP) cores for Field Programmable Gate Arrays (FPGAs) using MATLAB and Simulink. It contains a library of blocks for bit- and cycleaccurate simulations and gives the possibility to automatically generate the HDL code from the Simulink block diagrams using the blocks from the Library (see Figure 1). The resulting HDL code can be synthesised and implemented in FPGA. The System Generator generates the command batches

0-7695-1926-1/03/$17.00 (C) 2003 IEEE

Figure 1. Design Flow using the Xilinx System Generator for DSP

for the FPGA synthesis, logic simulation and for the implementation tools (Place and Route), but also all the necessary files for functional verification (test bench) and the timing specifications (constraints). The extension of the System Generator can be both VHDL and Verilog code (in our designs, we have used the VHDL). In the following text, we will use the XSG ”Black-box” approach to extend the blockset with the logarithmic arithmetic functions.

3

High Speed Logarithmic (HSLA) blockset

Arithmetic

The System Generator blockset does not contain any floating-point operations (which are difficult and expensive to implement). For a DSP application, this may be a drawback affecting its performance. The HSLA toolbox, developed in our group [1], [2], [6], [8] offers a viable solution to implement arithmetic operations with the precision of the floating-point (IEEE 754) representation. It consists of a Ccoded library of logarithmic arithmetic functions and their HDL-based IP core equivalents. This gives broad application possibilities: the functions can be used for bit-exact simulations in Matlab, Simulink or stand-alone C programs, as well as for testing the actual FPGA implementation. See Figure 2 for the 19-bit HSLA Library and Table 1 for an overview of basic logarithmic operations.

4

XSG implementation of the HSLA blockset

As stated in Section 2, the HSLA blockset is implemented in the System Generator using the ”Black-box” mechanism. See Figure 3 for an example: it shows implementation of the logarithmic multiplication (LMUL) block from Figure 2. This block has latency of two and it is not pipelined (see Table 2 for parameters of all basic operations). The in/out ports (LA, LB, LY) of the Black Box subsystem are connected to the XSG Gateway Out/In blocks. Hence, the functionality of the Black Box can be described (for simulation) using standard Simulink blocks: the latency of the operation has been modelled using combination of Delay blocks

Figure 2. HSLA blockset for the Xilinx System Generator

Table 1. An overview of the logarithmic equivalents of the basic arithmetic operations A × B : LY = LA + LB A ÷ B : LY = LA − LB A + B : LY = LA + fa (LB − LA ) A− √B : LY = LA − fs (LB − LA ) A : LY = LA  1 A2 : LY = LA  1 LA = (1 − Z) · (−1)SA · 2TA Z = 219 − 1 SA = sgn(A) TA = logb |A|

and a Multiport Switch; the functionality at the bit level is modelled by a mex-function lmul embedded in the ”Matlab function” block. During the code generation phase, the wrapper contained in the Black Box (lmul19 wrapper in Figure 3) will replace the whole part between the in/out ports (LA, LB, LY) . The wrapper typically contains only interfaces to the IP cores where the functionality is already described. It allows to replicate the represented block multiple times in the same Simulink model. Using this approach, we have encapsulated the HDL representation of all the basic HSLA operations (see the overview in Table 2) and added the floating-point operations to the System Generator.

0-7695-1926-1/03/$17.00 (C) 2003 IEEE

Figure 3. Cycle- and bit- exact implementation of the logarithmic multiplication (LMUL)

Table 2. Implementation details of 19bit HSLA operations for Virtex2000E-6 on the card RC1000 (addition and subtraction uses 8 Virtex BRAMs) Logarithmic Operations #Slices #Equiv.Gates Max.freq. [MHz]. Latency [CLK] Pipelined Multiplication 218 (1%) 3 844 64 2 No Division 235 (1%) 4 016 54 2 No Square root 175 (1%) 3 170 59 2 No Addition 1 478 (9%) 132 410 38 9 Yes 1 789 (9%) 132 410 38 9 Yes Subtraction

5

Using Simulink to prepare FPGA testbench for the HSLA blockset

Using the System Generator, it was also possible to set up a Simulink testbench for evaluation of the HSLA operations in the FPGA hardware. For this testbench, the RC1000 card from Alpha Data, Ltd. [4] has been used as the target hardware platform (see Figure 4). The Gateway blocks are used to separate the inner part of the model designed using the System Generator blockset (the Subsystem block in Figure 4), which will be used to generate the HDL code, from the rest. The System Generator block is used to control the generation of the code out of the Subsystem block in our case. A handshake algorithm is added to the final generated files. This handshake was used to verify all the basic operations contained in Table 2 on the PC with RC1000. The parameters in the first three columns (#Slices, #Equiv. Gates, Max. freq.) correspond with the implementation in the card (the utilised resources include all resources used by the handshake part of the design located behind the wrapper see Figure 4. An illustrative example of a computation, designed using the HSLA toolbox, is presented in Figure 5. The latency between the input In1 and the LDIV block and the unused part of the LADD block (the logarithmic addition is a dualissue block) are handled using the Delay blocks.

6

Designing XSG extensions using high-level tools

The Black-box approach used in the System Generator has one drawback: all extensions have to be designed in an HDL code. In this section, we would like to propose an approach using combination of the Real-Time Workshop (RTW) with the Handle-C compiler. Modified design flow is shown in Figure 6 the left-down part is an extension to the flow presented in Figure 1. The Real-Time Workshop [9] is a Simulink tool that is used to compile the Simulink models into some high-level language (HLL) - most often to a C code (see upper part in Figure 9). The output of the Real-Time Workshop is controlled by the Target Language Compiler (TLC), which is an integral part of the RTW. Thus, by making modifications to the parser used by the TLC, it is possible to generate code in other languages. In our case, the Handle-C language is a HLL used by the DK-1 development environment from Celoxica [5]. For our purpose, it is important that this tool can be used to synthetise the HDL code or to generate an EDIF netlist from the original HLL description. In our solution, we have utilised a TLC to parse the Simulink block diagram and to generate the Handle-C code as an output. This Handel-C based extension was tested on an example of FPGA implementation of the n-order QRD-RLS algorithm (see Figure 7). The implementation results with the RC1000 card are summarized in the Table 3.

0-7695-1926-1/03/$17.00 (C) 2003 IEEE

Figure 4. Support for RC1000 card

Figure 5. Utilisation of HSLA and XSG blocks

Figure 6. Design flow for Rapid-Prototyping of the HDL-based Black Boxes Figure 7. Adaptive identification QRD RLS The two versions of the 2nd order QRD-RLS presented in Table 4 differ in the way they utilize the LADD block (which contains two adders). The 2xQRD version is duplicated 1xQRD algorithm, but it uses both parts of the dualLADD (see Figure 2) to double the sampling frequency. See [1] for more information on the implementation.

7

Future development: HW/SW Co-Design

In the complex signal processing applications, the HSLA-based approach can be used as a sort of floating-

0-7695-1926-1/03/$17.00 (C) 2003 IEEE

Table 3. Implementation details of 19bit HSLA based QRD-RLS for Virtex2000E-6 on the card RC1000 with low optimisation of the design (addition and subtraction uses 8 Virtex BRAMs) Algorithm #Slices #Equiv.Gates Max.freq. [MHz]. Latency [CLK] Max. Smpl. Freq. [kHz] 1xQRD 2 250 (18%) 106 376 18 52 356 2xQRD 4 128 (33%) 141 149 19 52 730

point coprocessor for the DSP - the load of the signal processor is decreased by allocating parts of the code in the 1 and 2 in Figure 8) This is, in fact, the FPGA circuit (see solution that has been presented in this paper. However, using the combination of RTW and XSG (as described in the preceding section), we can go farther: the XSG will be used for the system-level based implementation on one or more FPGAs, while the RTW is used to co-design the control program for the controlling DSP/CPU 3 in the Figure 8). Finally, it will be possible to simu(see late the Run Time Reconfiguration schedule (RTR, see [7]) and, using the RTW, to compile it for the target implemen4 in Figure 8. tation. The concept for two FPGA chips is in

1 RTW concept is similar to preFigure 8. sented XSG and widely used for DSPs (see 2 XSG concept server for FPGA Figure 9; implementation; part of the CPU code can be removed and implemented in the FPGA by exploring the structure of the algorithm (paral3 mixed RTW-XSG concept to support lelism) co-design and co-simulation even on-the-fly; currently we support RTR, by this concept we 4 Problem decan address PRTR as well; picted for a Multi-FPGAs based system With the Soft CPU like MicroBlaze from Xilinx is (for more details see [10]), it will even be possible to extend this approach so that we can use the high-level tools for the System on Chip (SoC) design (see Figure 10). The concept of utilisation of the HSLA is at the stage of designing IP cores for DSPs. Some results are summarised in Table 4. For up-to-date information please see [1].

8

Conclusion

We have presented a methodology to use the Simulink environment for rapid prototyping of the DSP systems

Figure 9. Complete design flow for the HW/SW co-design

Figure 10. HW/SW co-design and related RTW-XSG utilisation on the example of HSLA i.e. floating-point processing; both ASIC (on board CPU) and SoC (soft CPU) can be supported

where part of the code is allocated in the FPGA circuit. Two approaches were discussed: first, the Black-box approach of the Xilinx System Generator was demonstrated on an example of implementation and verification of the High-Speed Logarithmic Arithmetic (HSLA). Second, the methodology using the combination of the Real-Time Workshop with custom-designed Target Language Compiler (TLC) parser configuration with the

0-7695-1926-1/03/$17.00 (C) 2003 IEEE

Table 4. Implementation details of 19bit HSLA based RLS Lattice filters of the order 252. Maximal sampling frequence of implementation on Virtex 2000E-6 is compared to TI C6711 floating-point DSP Freq. [MHz] Sampl Freq. [kHz] RLS Lattice filter type #Slices #BRAMs Virtex TI Virtex TI mono 67% 35% 52 167 16,7 0,6 stereo 34% 14% 55 167 17,1 1,3

Handle-C compiler from Celoxica has been presented. The advantage of this approach is that is creates an automatic link from the Simulink model to the HDL code, which can, in turn, be used to extend the System Generator blockset. Finally, using the State Flow tool, it will even be possible to extend this approach to model partial run-time reconfiguration (PRTR) problems. This option and others are currently studied in our group in the frame of the IST funded RECONF project [3]. This work has been partially supported by the Ministry of Education of the Czech Republic under Project LN00B096 and EU Project Reconf through the IST Programme IST-2001-34016.

[10] Xilinx Inc. Xilinx System Generator and Micro Blaze. ”[available online: http://www.xilinx.com accessed: 18.10.2002]”.

References [1] HSLA homepage of the Department of Signal Processing. [available online: http://www.utia.cas.cz/ ZS/home.php?ids=hsla, accessed: 18.10.2002]. [2] HSLA Project Homepage: ESPRIT programme, LongTerm Research project 33544. [available online: http://napier.ncl.ac.uk/HSLA, accessed: 18.10.2002]. [3] RECONF Project Homepage: IST programme 2001-34016. [available online: www.reconf.org, accessed: 18.10.2002]. [4] Alpha Data Parallel Systems Ltd. ADC-RC1000 card. [available online: http://www.alpha-data.com/adcrc1000.html, accessed: 18.10.2002]. [5] Celoxica Ltd. Handel-C. [available online: http://www.celoxica.com, accessed: 18.10.2002]. [6] J. N. Coleman, E. Chester, C. I. Softley, and J. Kadlec. Arithmetic on the European Logarithmic Microprocessor. IEEE Trans. Comput. Special Edition on Computer Arithmetic, 49(7):702–715, July 2000. [7] S. A. Guccione and D. Levi. The Advantages of run-time reconfiguration. Xilinx. [available online: http://www.io.com/ guccione/Papers/RTR/RTR.ps.gz, accessed: 18.10.2002]. [8] A. Hermanek, R. Matousek, M. Licko, and J. Kadlec. FPGA implementation of logarithmic unit. Proc. of Matlab, ISBN 80-7080-401-7:84–90, 2000. [available online: http://phobos.vscht.cz/matlab00/hermanek.pdf, accessed: 18.10.2002]. [9] The MathWorks Inc. Real-Time Workshop. [available online: http://www.mathworks.com/ access/helpdesk/ help/toolbox/rtw/rtw.shtml, accessed:18.10.2002].

0-7695-1926-1/03/$17.00 (C) 2003 IEEE

Suggest Documents