FPGA- in- the- Loop Implementation of an Adaptive Matrix Inversion Algorithmic Co- Processor: An Embedded Dual- Processor System Vincent A. Akpan* Abstract This article presents a comprehensive and efficient model- based technique on how algorithms can be developed, synthesized, modeled, pre- verified and implemented on embedded processors platforms which consist of a personal computer and a field programmable gate array (FPGA). To illustrate the proposed technique a new adaptive matrix inversion algorithm is proposed and used. The algorithm is first implemented as a synthesizable streamingloop floating- point MATLAB programs. The MATALAB programs are then synthesized using Xilinx AccelDSP to generate a System Generator block model equivalent of the MATLAB programs. Using the generated System Generator block model, the Xilinx System Generator for DSP is then employed to develop a complete System Generator hardware model of the adaptive matrix inversion algorithm. A FPGA- in- the- loop co- simulation and pre- verification using a generated hardware co- simulation block model is carried out for performance comparison. Next, an embedded MicroBlaze™ processor system is designed, tested and imported into a System Generator hardware model of the adaptive matrix inversion algorithm inside MATLAB/ Simulink environment; and a complete FPGA- in- the- loop implementation is performed. The FPGA- in- the- loop simulation results are presented. Conclusions drawn from the study are given together with some discussions and directions for further work.
Keywords: Adaptive matrix inversion algorithm, embedded co- processors, embedded systems design, field programmable gate array (FPGA), FPGA- inthe- loop, hardware/ software co- design, System Generator block model, System Generator hardware model.
Introduction Distributing computationally intensive algorithms over different computing and co- processing platforms for real- time applications are very common in high- performance computing communities. Recently, field programmable gate arrays (FPGAs) have been proposed as an efficient computing platform.1-4 One of the most exciting developments in FPGAs has been the emergence of hard and soft FPGA- embedded processors.5 Embedded processors are important advances that have made single- chip FPGA- based system a practical computing platform. There are several advantages of embedded processors inside an FPGA.5-7 Generally, an FPGA embedded processor system may be defined as software implemented in hardware to realize specific real- time functionalities with perhaps additional parts and *
with some kind of connections to the outside world. The hardware and software components of an FPGA embedded system present difficulties for complex problems. First, the difficulty of generating a design from a set of requirements increases as the system become more complex. Furthermore, translating a set of design specifications into a computer- parsable format presents a number of problems.8 In addition, there are challenges to using FPGAs as software platform due to the historical disconnect between software development methods and the lowerlevel methods required for hardware design, including embedded system designs using FPGAs.6, 7 For example, software programmers may not have the necessary skills to make use of hardware design tools or hardware- oriented languages such as Verilog or VHDL (Very high speed integrated circuit Hardware Description Language). Software programmers using FPGA can also form parts of a larger system or product
Department of Physics Electronics, The Federal University of Technology, P.M.B. 704 Akure, Ondo State, Nigeria E-mail :
[email protected]
© ADR Journals 2015. All Rights Reserved.
29
may also be faced with design methodologies that are new and unfamiliar, including the need to efficiently partition between hardware and software. These same arguments may be true for hardware designers who are not familiar with software programming. The current trend for FPGA- based embedded systems designs has been to allow for quick implementation without need to understand all the intricate details of the synthesis and implementation. At the same time, access to lowlevel features such as custom instructions, built- in functions/ macros, assembly languages and appropriate hardware description language to achieve maximum possible design performance are now possible. However, instead of the traditional hierarchical design techniques which begin at the complicated lower- abstraction register transfer level (RTL),9 this article begins at a higher- abstraction model- based algorithmic level and then combines the electronic system level (ESL) to arrive at the RTL before design implementation. The fate of high- level modeling is inextricably intertwined with the fate of ESL design. ESL design has been discussed for many years, yet it has hardly become the natural starting point for the design process for most design. In the hardware domain, most design activities start at the RTL level with a textual specification passed from on high. In the software domain, most design activities start with bashing C code out in an integrated development environment. However, with the advent of C- based FPGA design tools 7, 11, 12 ; it is now possible to use familiar software design tools such as MATLAB and standard C languages for a much larger percentage of the embedded system design, and in particular those parts of the design that are computationally intensive. Due to the capability of visualizing results, MATLAB is adopted as the software tool for the model- based algorithm development. In order to demonstrate the FPGA embedded system design technique proposed in this article, an adaptive matrix inversion algorithm is proposed. Matrix inversion is an important computation in linear and nonlinear optimization in diverse areas of applications such as in neural networks, neural- fuzzy system, digital signal processing, system identification, parameter estimation, adaptive control, recursive algorithms, etc. The algorithm proposed here checks and ensures that a given matrix is not ill- conditioned, none singular or positive definite before the inversion is performed.
J. Adv. Res. Embed. Sys. 2015; 2(1):28-63.
Akpan VA
The article is organized as follows. In Section II, the adaptive matrix inversion algorithm is presented. Section III gives an overview of the proposed FPGA design methodology. The synthesis of the adaptive matrix inversion algorithm is presented in Section IV whereas the System Generator hardware model of the adaptive matrix inversion algorithm follows in Section V. FPGA- in- the- loop co- simulation is the subject of Section VI. The embedded MicroBlaze™ processor system development and testing is the subject of Section VII. Then in Section VIII, the embedded MicroBlaze™ is imported and integrated with the System Generator hardware model of the adaptive matrix inversion algorithm followed by a complete FPGA- in- the- loop implementation and performance verification. Brief conclusion, discussions and directions for further works are given in Section IX.
The fpga implementation example problem: adaptive matrix inversion algorithm Handling Ill- Conditioned, Singular and NonPositive Definite Matrix Given a coefficient square matrix A, an obvious question could be: what happens when a coefficient matrix A is almost singular; i.e., if the absolute value of the matrix |A| is very small? In order to determine whether the determinant of the coefficient matrix is “small”, we need a reference against which the determinant can be measured. This reference is called the norm of the matrix, denoted by ||A||. Then, the determinant can be said to be small if |A|
A_MAT RIX
Terminate_CH
CONV_A_MATRIX
AMATRIX_ROM
CH_FACTORS
Fix _21_16
di n
we
dout
Fix_21_16
en
Out
double
CH_FACT ORS_OUT
AMAT _ShareMem TO_Reg_CH
>
>
Terminate_INV
l im uint8
PDHM_INV Counter Limited
Bool
Fix _29_25
di n
1
dout
Fix_29_25
en
EN_INReg
Out
double
PDHM_INV_OUT
T O_Reg_INV >
Terminate_PDH
addr CON_NUM UFix_8_0
In
UFix_1_0
UFix _32_14
z-1
IN_OUT _SEQ
din
dout
UFix _32_14
en
cast
dout
UFix _5_0
OUT _SEQ
OUT_SCOPE
di n
addr
UFix _1_0
en
Out
double
CON_NUM_OUT
10 CH_NUM Display
CONV_OUT _SEQ T O_Reg_CON
OUT SEQ_ROM
>
we
T ermi nate_CON T erminate_A_MAT RIX
NS_NUM
OUT SEQ_ShareMem
UFix_1_0
di n
>
T ermi nate_OUT _SEQ
O ut
O ut
VIEW_INA
dout
UFix _1_0
SN_NUM_OUT
en
VIEW_OUT _SEQ FUNC_I NV_PDHM
Out
0 SN_NUM Display
TO_Reg_SN
double double
>
1
double
T erminate_SN
Bool
EN_OUTPUTS IN_SCOPE
CH_FACT ORS_OUT
CON_NUM_OUT
double
double
IN_A_MATRIX PDH_MATRIX_OUT
JT AG Co-si m
PDHM_INV_OUT
SN_NUM_OUT
IN_OUT_SEQ
VIEW_INA
VIEW_OUT _SEQ
INV_POSDM_MODEL hwcosim
double
double
double
double
double
HW_CoSim_OUT _SCOPE
0 CH_NUM Display1
Terminate_A_MATRIX1
Terminate_CON1
T erminate_OUT_SEQ1
0 SN_NUM Display1
Terminate_SN1 HW_CoSim_IN_SCOPE
Figure 10.The System Generator hardware model for the adaptive matrix inversion algorithm with the generated Hardware Co- Simulation block model model of the algorithm will perform well when Comparing the Simulink simulation results of fig. implemented on the Virtex-5 ML507 FPGA. The 11(a) and (b) for the System Generator hardware device utilization summary for the System model of the adaptive matrix inversion algorithm Generator hardware model is given in Appendix A and those obtained in fig. 11(c) and (d) and shows the efficiency of the synthesized respectively by the hardware co- simulation block System Generator hardware algorithmic model of model shows almost identical results and the adaptive matrix inversion algorithm. guarantees that the System Generator hardware
(a)
J. Adv. Res. Embed. Sys. 2015; 2(1):28-63.
(b)
Akpan VA
40
(c) (d) Figure 11.Hardware- in- the- loop co- simulation of the matrix inversion System Generator model with Virtex5ML507 FPGA board over JTAG cable: (a) and (b) are the inputs and outputs of the true SystemGenerator model while (c) and (d) are the inputs and outputs waveforms produced by the generated Hardware Co- Simulation block running in the FPGA. The input and output signals are in the same order as defined in fig. 7 available FPGA device family Virtex-5 The Embedded MicroBlaze™ Processor XC5VFX70T is selected and the speed grade for System Design and Testing this FPGA device family is -1 and is thus specified This section is devoted to the development of an as well as the device package of FF1136. The embedded MicroBlazeTM processor system using Xilinx synthesis tool (XST) is selected as the the Xilinx EDK design tools which consist of the synthesis tool. The Xilinx ModelSim-SE is XPS and the Xilinx SDK. Here, the hardware selected as the simulation tool. The language for portion of the embedded MicroBlazeTM processor the embedded processor system development is system is design using the XPS via the Xilinx the VHDL (very- high- speed hardware ISETM Foundation while the software portion for description language). Note that the same Virtex-5 initializing all the peripheral and memory drivers FPGA properties similar to that in fig. 4 for the of the embedded processor system is developed AccelDSP case in Section IV are selected here. In using the Xilinx SDK. The combined hardware addition to these selections, the Embedded and software portions constitute the embedded Processor (Emb_Proc_Sys.xmp) template is also processor system which is tested on a Virtex-5 added as a “New Source” in this New Project FXT ML507 FPGA development board via the Wizard. The “Emb_Proc_Sys” project summary is Xilinx SDK GUI for performance verification. shown in fig. 12(a). The embedded processor system design When the “New Project Wizard” is completed, the considerations outlined and discussed in some studies5,12,27 are considered and adopted ISE™ initializes and automatically starts up the XPS since “Embedded Processor” was added as a throughout the embedded MicroBlaze™ processor “New Source”. The XPS in turn initializes and system design presented in this work and brings up the Base System Builder (BSB) which is appropriate references are made accordingly were an automated tool that is used to create the necessary. embedded MicroBlaze™ processor system. The Hardware Portion of the Embedded embedded processor system design using the BSB MicroBlaze™ Processor System Design: The is an eight- stage procedure 5, 23, 27; namely: embedded MicroBlaze™ processor systems design Welcome, Board, System, Processor, Peripheral, using the MicroBlaze™ core is instantiated from Cache, Application, and the Summary. As listed the Xilinx ISE which then initializes the XPS respectively in the menu located at the top of the where the actual embedded MicroBlaze™ BSB GUI of fig. 12(b). processor systems’ design is done. The “Summary” is the last stage of the BSBguided procedures for creating the embedded The Xilinx ISE™ is started and the project name processor system. This stage lists all the available is assigned on the “New Project Wizard”. The peripheral associated with the just created name assigned here for the MicroBlaze™ embedded processor system together with their processor system is “Emb_Proc_Sys”. Our instance names as well as the base and high
J. Adv. Res. Embed. Sys. 2015; 2(1):28-63.
41
Akpan VA
addresses as shown under System Summary in fig. 12(b). The “Summary” stage also list the major software associated with the processor system as shown under Overall in the File Location category
in fig. 12(b). The components of the previous “Application” stage are also listed in the “Summary” stage dialog window.
(a) New Project Wizard: “Project Summary”
(b) Based System Builder (BSB): “System Summary” Figure 12.(a) New project wizard design summary and (b) the base system builder (BSB) system summary for the embedded MicroBlaze™ processor System
J. Adv. Res. Embed. Sys. 2015; 2(1):28-63.
29
Akpan VA
Next, the just created embedded MicroBlaze™ processor system must be compiled so that all the memory types, peripherals, memory and peripheral driver software and the entire embedded processor system can be updated. The Xilinx ISE™ Foundation and the XPS tools are used interchangeably to perform these compilations as summarized below: 1) Starting with the XPS, the board support packages (BSPs) and libraries are generated by selecting “Software Generate Libraries and BSPs” on the XPS graphical user interface (GUI) shown in fig. 13. 2) Next, the Netlist is generated by selecting: “Hardware Generate Netlist” from fig. 13. This stage of the design also generates all the “wrappers”, device drivers, and all the necessary design and technology files that would be required by ISE™ for complete synthesis and implementation of the embedded processor system. 3) After the Netlist generation, attention is then turned to the Xilinx ISETM. A section of the Xilinx ISE™ graphical user interface (GUI)
for the MicroBlaze™ embedded processor system design is shown in fig. 14. Note that during the Netlist generation, the “User Constraint File (UCF)” was generated. The UCF file has the project name with a ucf extension, that is, “Emb_Proc_Sys.ucf” and is always located in the directory “data” in the processor hierarchy. This file defines the constraints on the created processor system together with input- output (I/ O) mapping of the complete design to the Virtex-5 FX70T FPGA device family and the selected package in fig. 12(a). This file is introduced into the processor system by selecting “Project Add Source” from the ISE™ GUI of fig. 14, and navigating to “data” directory and finally adding the “Emb_Proc_Sys.ucf” file. 4) Next, the programming file (BitStream) for the complete embedded MicroBlaze™ processor system is generated by Doubleclicking the blue- colored highlighted “Generate Programming File” shown in fig. 14 to generate the programming file for the embedded processor project. This is the design implementation phase.12,27
Figure 13.The XPS graphical user interface (GUI) for the creation and initial compilation of the embedded MicroBlaze™ processor system.
J. Adv. Res. Embed. Sys. 2015; 2(1):28-63.
43
Akpan VA
Figure 14.ISETM implementation of the embedded MicroBlaze™ processor system design using the XPS in the EDK environment together with the design summary.
The various stages of this design implementation phase are shown in the actual ISE™ design flow in the ISE™ GUI of fig. 14 which consists of seven major design implementation phases, namely: Step 1) User Constraints, Step 2) Synthesize – XST (Xilinx Synthesis Tool), Step 3) Implemented Design, Step 4) Generate Programming File, Step 5) Configure Target Device, Step 6) Update Bitstream with processor Data, Step 7) Analyze Design Using Chipscope. Double- clicking the “Generate Programming File” on fig. 14 implements steps 2, 3 and 4 to generate this file. Note that the XPS generated the UCF file which takes care of step 1. Otherwise using the Xilinx PlanAhead tool, the UCF file would have been created here in step 1. Because, the design is not yet ready for the FPGA programming, steps 5, 6, and 7 are not implemented here. However, as can be seen in fig. 14, the generation of the bitstream completed with all signals routed, all constraints met and without errors but with some warnings which is normal. 5) Note that the embedded processor design is
J. Adv. Res. Embed. Sys. 2015; 2(1):28-63.
coordinated by both the Xilinx ISE™ and the XPS. It is observed that immediately after the generation of the Programming File (bitstream); the Xilinx ISE™ indicated that the project design is out of data while the XPS indicated that the project file has changed on disk on their respective GUIs. Therefore, step 1 to step 4 is repeated to update the system, after which both notifications disappear. In addition to the Programming File, an important file is also generated called the “Block Memory Map (BMM)” file with extension bmm. For the current MicroBlaze™ processor project, this file is edkBmmFile_bd.bmm. The BMM file is a text file that has syntactic descriptions of how individual block RAMs constitutes a contiguous logical data space. The Xilinx Data2MEM 28 uses the .BMM files to direct the translation of all data into the proper initialization forms. Note that since the .BMM file is a text file, it is directly editable. This file together with the bitstream and all the generated device drivers will be required to program the Virtex-5 FPGA during the software design portion of the embedded processor system. The .BMM file is located in
Akpan VA
the top level directory of the processor system together with the bitstream (with extension .BIT). For convenience, the design and device utilization summary for the embedded MicroBlaze™ processor system generated by the Xilinx ISE™ Foundation is given in Appendix B. 6) Since the embedded processor project is now fully updated by both Xilinx ISE™ and XPS, attention is again turned to the XPS shown in fig. 13 to perform the following tasks: i) Generate the block diagram of the complete embedded MicroBlaze™ processor system by selecting from the XPS GUI of fig. 13: Project Generate Block Diagram Image which is shown in fig. 15. ii) Generate the complete embedded MicroBlaze™ processor system report by selecting from the XPS GUI of fig. 13: Project Generate and View Design Report. This report gives the detailed information on the embedded MicroBlaze™ processor system but is not shown in this work since it is more than 30 pages. It is useful as a reference note for accessing the different peripherals, memory types, and memory and peripheral drivers especially when modifications, addressing and integrating custom hardware are necessary. iii) Generate and export the designed embedded MicroBlaze™ processor hardware to the Xilinx software
44
development kit (Xilinx SDK) by selecting from the XPS GUI in fig. 13: Project Export Hardware Design to SDK. Although the Export dialog box offers two options for exporting the designed hardware: Export Only and Export and Lunch SDK, the “Export Only” is selected since the designed hardware will be exported in the next subsection for memory and peripheral testing of the designed MicroBlaze™ processor system. However, this export process automatically creates an SDK directory in the current design hierarchy and places the hardware structure of the designed embedded MicroBlaze™ processor system (Emb_Proc_Sys.xml) as an XML (extensible markup language) document in the created SDK directory.
Software Portion of the Embedded MicroBlaze™ Processor System Design Now that the embedded MicroBlaze™ processor system has been design and placed in the SDK directory of the current project directory. The next step is to test the functionalities peripherals and memories of the design hardware using the Xilinx SDK. Here the “ProjectName” refers to the memory or peripheral test applications shown in fig. 12(b) and fig. 13. The procedures for importing the predesigned embedded MicroBlaze™ processor system and software development.
J. Adv. Res. Embed. Sys. 2015; 2(1):28-63.
45
Akpan VA
Figure 15.Block diagram of the embedded MicroBlaze™ processor system with peripherals, memories and buses with additional hardware/ software specifications and keys to symbols
MicroBlaze™ processor system and software development using the Xilinx SDK can be outlined as follows:21,23,24,27,28 1. First, we import the designed embedded MicroBlaze™ processor system which is a Emb_Proc_Sys.xml file available in the SDK directory into this working directory. 2. A new Software Platform is created on top of the designed processor system for memory and peripheral staging, initializing all the memory and peripheral drivers. 3. A new Manage Make C Application Project is created for software development and implementation each for the memory and peripheral tests. 4. Save the project. Saving the project automatically builds the project and reports error(s) which are corrected, if any. 5. A linker script (ProjectName.ld) is
J. Adv. Res. Embed. Sys. 2015; 2(1):28-63.
automatically generated during the build process. However, if there is no linker script in the Manage Make C Application Project directory, one can right- click the Manage Make C Application and select “Generate Linker Script”. This option automatically opens the “Generate Linker Script” dialogue box as shown in fig. 16(a). Here, we change the memory allocation in the Generate Linker Script dialogue box to use higher memory allocation by setting “Assign all code Sections to:” to an onboard memory SRAM_MEMO_BASEADDR” as shown in the drop menu of fig 16(a). 6. Next, we start a pre- configured HyperTerminal (ml507_vin) session, turn on the FPGA and program the FPGA by selecting Tools Program FPGA. a. Locate and include the Bit- Stream file i.e. the
Akpan VA
46
emb_proc_sys.bit file. b. Locate and include the Block Memory Map file i.e. the edkBmmFile_bd.bmm file. c. Finally, BootLoop is selected and “Save and Program” is clicked. Since all conditions were met without errors, the Virtex-5 FPGA was successfully programmed with the embedded MicroBlaze™ processor bitstream (emb_proc_sys.bit) as is evident in fig. 16(b). 7. By right- clicking the linker script and selecting the Debug As Debug on Hardware, the software debugging environment opens up as shown in fig. 17. Note that: I. The universal asynchronous receiver transmitter (UART) serial port (commonly called serial port) uses a protocol that provides a useful and convenient way of testing processor- based, high- level code. The print command of C is used to display intermediate values of software algorithms. II. Here we use the HyperTerminal protocol to display the results from the FPGA JTAG cable via the serial port of our Intel® Core™ 2CPU computer. Note that the HyperTerminal must have been setup and configured in
advance to use the UART- based serial port on the host computer or development platform. The procedures for setting up and configuring the HyperTerminal for Virtex-5 FXT ML507 FPGA board in Windows are as follows: 8. Select Start All Programs Accessories Communications HyperTerminal. Set up the terminal to listen to the COM port using a null modem RS232 serial cable. The COM port allows for text inputs and outputs to be read from and written to the hardware under test through the RS232 serial cable. Set Baud rate = 115200, Data = 8 bits, Parity = none, Stop = 1 bit, Flow control = none. 9. In this test runs, we selected the Peripheral Tests and then Memory Tests applications under the Manage Make C Application Project options in step 4 above independently. Running the two test applications independently by clicking the Run (or Relunch) button gives the results from the ml507 HyperTerminal window shown in fig. 18(a) and (b) for the Peripheral and memory test on the designed embedded MicroBlaze™ processor system.
(a)
J. Adv. Res. Embed. Sys. 2015; 2(1):28-63.
47
Akpan VA
(b) Figure 16.(a) Generating the linker script with memory allocation adjustment and (b) programming the FPGA using the Xilinx platform cable USB II from the Xilinx SDK environment via the ISETM Foundation software
Figure 17.The debugging environment of the Xilinx SDK integrated development environment for testing and debugging the embedded MicroBlaze™ processor peripheral and memory systems.
J. Adv. Res. Embed. Sys. 2015; 2(1):28-63.
Akpan VA
48
(a)
(b) Figure 18.Test results for the embedded MicroBlaze™ processor system: (a) peripheral test and (b) memory test
Importing and Integrating the MicroBlaze™ Processor System with the System Generator Hardware Algorithmic Model of the Adaptive Matrix Inversion Algorithm Communication Interface Between System Generator for DSP and EDK for Distributed Co- Processing Applications The embedded MicroBlaze™ processor system can be imported into the System Generator hardware model via the Xilinx “EDK Processor” block which can be found in the System Generator for DSP blocksets in the Simulink browser. Note that this EDK Processor has been included in the “EDK Processor Subsystem” of fig. 6 which is shown in fig. 19. The “EDK Processor” block in fig. 6(a) for coprocessor hardware development within the System Generator supports two modes of
operation as shown in fig. 20, namely: 1) the EDK Pcore generation mode for generating a System Generator algorithmic co- processor model as a Pcore to the XPS and 2) the HDL netlisting mode for importing a pre- design embedded processor from the XPS to the System Generator model as a Netlist. The actual implementation of these two modes of operations are illustrated in fig. 21, the former is implemented in this work. The reason being that, as at the time of this work, the Xilinx XPS and System Generator for DSP only support importing a MicroBlaze™ processor system into a System Generator model whereas a coprocessor(s) can be exported to and integrated with an embedded MicroBlaze™, PowerPC™440 or a multi- processors systems processor system.21, 22 Thus, in this work, the embedded MicroBlaze™ processor system is to be imported and integrated with the adaptive matrix inversion algorithm to form an embedded multi- processor system.
J. Adv. Res. Embed. Sys. 2015; 2(1):28-63.
49
Akpan VA
Figure 19.The EDK Processor Subsystem before the MicroBlaze™ processor import process
Figure 20.EDK import and export options within the Xilinx System Generator for DSP via the EDK Processor block Export System Generator Model as Pcore to a Pre-Designed Processor System in XPS
MicroBlaze (MB) Processor
B us A dap ter
RAM Memory Map
RAM
FIFO
FIFO
Reg
Reg
IP Core, User-Defined or Custom Logic
Import a Pre-Designed Processor System from XPS as HDL Netlist into System Generator Model
Figure 21.Basic structure, interface and communication between the MicroBlaze™ processor and custom logic (or co- processor) hardware model of the adaptive matrix Importing and Integrating the MicroBlaze™ inversion algorithm has been created and its Processor performance verified as in Section V and VI; The procedures for importing the designed and that the hardware model has been added embedded MicroBlaze™ processor system from to the MATLAB/ Simulink path. the EDK and integrating it with the hardware ii. It is also assumed that the embedded model of the adaptive matrix inversion algorithm MicroBlaze™ processor system has been of fig. 6 in the System Generator for DSP within created and tested as in Section VII. However, the MATLAB/ Simulink modeling and simulation the MB processor design in Section III-B environment for FPGA- in- the- loop above is duplicated and is readily used here implementation and performance verification can for this MicroBlazeTM processor import 21, 23, 25 be outlined as follows: process for the adaptive matrix inversion 1) i. It is assumed that the System Generator
J. Adv. Res. Embed. Sys. 2015; 2(1):28-63.
Akpan VA
50
problem considered here. iii. It must be verified that an empty pcore directory exist in the embedded MicroBlaze™ processor system directory which must be at the same level with the embedded MicroBlaze™ processor. Although the pcore directory exists here, however, if it does not exist then one must be created and named “pcore”. iv. The embedded MicroBlaze™ processor system architecture changes during an import or export process. Thus, in case of mistake during the import process and/ or to avoid corrupting the original embedded MicroBlaze™ processor system created in the previous Section, a copy is made and used here for the FPGA integrated embedded multi- processors system design. 2) Next, we import the MicroBlazeTM processor system into the System Generator hardware model of fig. 6 using the EDK Export Tool selected in the System Generator token of fig. 8 via the HDL netlisting mode in the EDK Processor drop down menu of the EDK Processor Subsystem shown in
fig. 20. However, the internal structure of the EDK Processor block before the import process is as that shown in fig. 19. Note that: i. The XPS and the ISE™ must be closed before the MicroBlazeTM processor import process since the XPS project is modified and configured during the import process to work with the System Generator model inside System Generator for DSP. ii. Once the HDL netlisting mode is selected, a dialogue box pops up requesting for the processor to be imported. In this case, it is necessary to browse and locate the processor system (e.g. Emb_Proc_Sys.xmp) in the Emb_Proc_Sys directory. iii. The import process copies all the necessary files from the main MicroBlazeTM processor directory into the pcore directory within the XPS project directory and changes the XPS project to allow the MicroBlazeTM processor to communicate with the System Generator model. In fact, the MicroBlazeTM processor directory becomes empty.
(a)
(b) Figure 22. (a) the Basic tab with the added input/ output shared memories, and (b) the Base Address on the Implementation tab
J. Adv. Res. Embed. Sys. 2015; 2(1):28-63.
51
Akpan VA
Figure 23.The EDK Processor Subsystem after the MicroBlaze™ processor system import process with additional inputs and output connections to exposed the processor ports With the HDL netlisting mode selected in fig. added. The shared memory registers used here 20 and used with the EDK Export Tool are fully pipelined distributed Read and Write selected via the System Generator token of Memories (RAMs) with additional details fig. 8, the Xilinx System Generator for DSP is shown in fig. 22(a). Under the Implementation able to import and integrate the embedded tab shown in fig. 22(b), the Base Address is MicroBlaze™ processor system into the automatically set by the System Generator for System Generator hardware model of the DSP after the import process with the added adaptive matrix inversion algorithm of fig. 6 shared memories. Next, under the “Advanced” within the MATLAB/ Simulink environment. tab, we expose the The structure of the imported embedded fpga_0_RS232_Uart_1_RX_pin to enable the MicroBlaze™ processor system that is write operation for data transmission, imported into the System Generator hardware fpga_0_rst_1_sys_rst_pin for resetting the model of the adaptive matrix inversion FPGA, and fpga_0_RS232_Uart_1_TX_pin to algorithmic is shown by the portion enclosed enable the read operation for receiving data. by the lower bracket in fig. 21. Finally, we connect and configure the exposed In this HDL netlisting mode, the assumption is processor port. fig. 23 shows the wired and that the MicroBlaze™ processor system that configured EDK Processor Subsystem after has been added to the System Generator the MicroBlazeTM processor system import hardware model of the adaptive matrix process. Thereafter, we close the EDK inversion algorithm is just a place- holder.21,27 Processor Subsystem block. Its actual implementation is elaborated and 5) Again, returning back the EDK Processor filled in by the XPS when the hardware model Subsystem, right- click and select “Look of the adaptive matrix inversion algorithm is Under Mask” to view the internal structure of finally integrated with the embedded the imported MicroBlazeTM processor system MicroBlaze™ processor system in the EDK as with the associated adaptive matrix inversion discussed later in 5. algorithmic hardware peripherals and 3) After the processor import process, the memories as shown in fig. 24. Finally, the EDKProcessor block dialogue box is closed System Generator for DSP project within the by clicking “OK”. MATLAB/ Simulink environment is saved 4) Next, the EDK Processor block is re- opened. and closed. Under the “Basic” tab, shared memory As it is evident in fig. 24, the completed new registers for accessing the inputs and outputs embedded processor system consists of of the System Generator hardware model are custom logic, the generated memory maps and
J. Adv. Res. Embed. Sys. 2015; 2(1):28-63.
Akpan VA
52
virtual connections to the custom logic, and the bus adaptor. The completed new processor system also contains a collection of files describing the memories’ and peripheral’s hardware, software drivers, bus connectivity and documentations. It is obvious that the
XPS can allow peripherals and memories to be attached to pre- designed processor systems created within the XPS based on the System Generator hardware model information.21, 27 fp g a_ 0 _RS 2 32 _ Ua rt_ 1_ TX_ p in
1 fp ga _ 0 _RS 23 2 _ Ua rt_1 _ TX_ p in
Out
fp g a_ 0 _RS 2 32 _ Ua rt_ 2_ TX_ p in 1
fp g a_ 0 _ RS 2 32 _ Uart_ 1_ RX _p in
fp ga _ 0 _RS 23 2 _ Ua rt_2 _ TX_ p in
fp ga _ 0 _RS 23 2 _ Ua rt_ 1_ RX_ p in
Out
fp g a_ 0 _ SRA M_ Mem_ A_ p in
fpg a _0 _ S RA M_Me m_ A _ pin 1
In
fp g a_ 0 _ RS 2 32 _ Uart_ 2_ RX _p in
Out
fp ga _ 0 _S RAM_ Me m_CE N_ p in
fp g a _0 _ RS2 3 2_ Ua rt_ 2 _RX _ pin
fp ga _ 0 _S RAM_ Me m_CE N_ p in
Out
fp ga _ 0 _S RAM_ Me m_OE N_ p in 1
In
fp g a_ 0 _ PCIe _B ridg e _ RX N_p in
fp ga _ 0 _S RAM_ Me m_OE N_ p in
fp g a _0 _ PCIe _B rid ge _ RXN_ pin
Out
fp ga _ 0 _S RAM_ Me m_ WE N_p in
fp ga _ 0 _S RAM_ Me m_WE N_ p in 1
In
fp g a_ 0 _ PCIe _B ridg e _ RX P _p in
Out
fp ga _ 0 _S RAM_ Me m_B E N_ p in
fp g a _0 _ PCIe _B rid ge _ RXP _ pin
fp ga _ 0 _S RAM_ Me m_B E N_ p in
Out
fp g a_ 0 _S RA M_ Mem_ ADV _L DN_ p in 1
In
fp g a _0 _ S RA M_Me m_ A DV _ LDN_ pin
fp g a_ 0 _ Eth e rn e t_MA C_P HY_ tx_clk_ pin
fp g a _0 _ Eth e rne t_ MA C_P HY_ tx_ clk_ pin
Out
fp ga _ 0 _S RAM_ ZB T_ CL K _OUT_ p in
fp g a _0 _ SRA M_ ZB T_ CLK _ OUT_ pin 1
In
fp g a_ 0 _ Eth e rn e t_MA C_P HY_ rx_clk_ pin
Out
fp ga _ 0 _P CIe_ B rid g e _TX N_ p in
fp g a _0 _ Eth e rne t_ MA C_P HY_ rx_ clk_ pin
fp ga _ 0 _P CIe _ B rid g e _TX N_ p in
Out
fp ga _ 0 _P CIe_ B rid g e _TX P_ p in 1
In
fp ga _ 0 _P CIe _ B rid g e _TX P_ p in
fp g a_ 0 _ Eth e rn e t_MA C_P HY_ crs_ p in
fp ga _ 0 _E th ern e t_MA C_ P HY _ crs_ p in
Out
fpg a _ 0_ E the rn et_ MAC_ PHY _ rst_ n_ p in
fpg a _ 0_ E the rn et_ MAC_ PHY _ rst_ n _p in 1
In
Out
fp ga _ 0 _E th ern e t_ MA C_ P HY _ tx_ e n _p in
fp g a_ 0 _ Eth e rn e t_MA C_P HY_ d v_p in
fp ga _ 0 _E th ern e t_ MA C_ P HY _ tx_ e n _p in
fp ga _ 0 _E th ern e t_MA C_ P HY _ dv_ p in
Out
fp g a_ 0 _ Eth e rne t_ MA C_P HY_ tx_d a ta_ p in
fp g a_ 0 _E th ern e t_MA C_ P HY _ tx_d a ta_ p in 1
In
fp g a_ 0 _ Eth e rn e t_MA C_P HY_ rx_d a ta_ p in
Out
fpg a _ 0_ E the rn et_ MAC_ PHY _ MDC_p in
fpg a _0 _ E th e rne t_ MA C_ PHY _rx_ da ta _p in
fpg a _ 0_ E the rn et_ MAC_ PHY _ MDC_p in
1
In
Out
fpg a _ 0_ DDR2 _ SDRAM_ DDR2_ Clk_ p in
fp g a_ 0 _ Eth e rn e t_MA C_P HY_ co l_ p in
fpg a _ 0_ DDR2 _ SDRAM_ DDR2_ Clk_p in
fp ga _ 0 _E th ern e t_MA C_ P HY _ col_ p in
Out
fp g a_ 0 _ DDR2_ S DRA M_ DDR2 _ Clk_ n _p in
fp ga _ 0 _DDR2 _ S DRA M_ DDR2 _ Clk_ n_ p in 1
In
fp g a_ 0 _ Eth e rn e t_MA C_P HY_ rx_e r_ pin
Out
fpg a _ 0_ DDR2 _ SDRAM_ DDR2_ CE_ p in
fp g a_ 0 _ Eth e rn e t_MA C_P HY_ rx_e r_ pin
fpg a _ 0_ DDR2 _ SDRAM_ DDR2_ CE _p in
Out
fp g a_ 0 _ DDR2_ S DRA M_ DDR2 _ CS _ n _p in 1
In
fp g a_ 0 _ Eth e rn e t_MA C_MDINT_ pin
fp ga _ 0 _DDR2 _ S DRA M_ DDR2 _ CS _ n_ p in
fp ga _ 0_ E the rn e t_ MAC_ MDINT_ p in
Out
fp ga _ 0 _DDR2 _ S DRA M_ DDR2 _ODT_ p in
fp ga _ 0 _DDR2 _ SDRA M_ DDR2 _ODT_ p in 1
In
fp g a_ 0 _ SysACE _ Co mpa ctFla sh _S ysACE _CL K_ p in fpg a _0 _ DDR2 _S DRA M_DDR2_ RAS _ n _p in
Out
fpg a _0 _ S ysA CE _ Comp actFlash _ SysA CE _ CL K _p in
fp g a_ 0 _ DDR2_ S DRA M_ DDR2 _ RA S _n _ p in
Out
fpg a _0 _ DDR2 _S DRA M_DDR2_ CAS _ n _p in 1
In
fp g a_ 0 _ DDR2_ S DRA M_ DDR2 _ CA S _n _ p in
fp g a_ 0 _ SysACE _ Co mpa ctFla sh _S ysACE _MP IRQ_ pin
fp ga _ 0_ S ysA CE_ Co mp a ctFla sh _ S ysA CE_ MPIRQ_ p in
Out
fp g a_ 0 _ DDR2_ S DRA M_DDR2 _ WE _ n_ p in
fp g a_ 0 _ DDR2_ S DRA M_DDR2 _ WE _ n_ p in 2
fp g a_ 0 _ rst_ 1 _sys_rst_ pin
Out
fp g a_ 0 _DDR2_ S DRA M_ DDR2 _ Ba n kA d d r_ p in
fp ga _ 0 _rst_1 _ sys_ rst_p in
fpg a _ 0_ DDR2 _ SDRAM_ DDR2_ B a nkA d dr_ p in
Out
fp ga _ 0 _DDR2 _ S DRA M_ DDR2 _ Ad d r_p in
fp ga _ 0 _DDR2 _ SDRA M_ DDR2 _A d d r_ p in
sg _ sl_ a dd ra ck
Out
fpg a _ 0_ DDR2 _ SDRAM_ DDR2_ DM_p in
fpg a _ 0_ DDR2 _ SDRAM_ DDR2 _DM_p in 0
Sl_wait
Out
fp g a _0 _ SysA CE _ Co mpa ctFla sh _S ysACE _MP A_ p in
sg _ sl_ wait
fp g a_ 0 _S ysACE _Co mpa ctFla sh_ S ysA CE_ MP A_ p in
Out
fp g a _0 _ SysA CE _ Co mpa ctFla sh _S ysACE _CE N_p in
fp g a_ 0 _S ysACE _Co mpa ctFla sh_ S ysA CE_ CE N_ p in sg _ sl_ wrco mp
Out
fp g a _0 _ SysA CE _ Co mpa ctFla sh _S ysACE _OE N_ p in
fp g a_ 0 _S ysACE _Co mpa ctFla sh_ S ysA CE_ OE N_ p in
Out
fp g a _0 _ SysA CE _ Co mp actFlash _ SysACE _ WE N_ p in sg _ sl_ wrda ck
fp g a_ 0 _ SysACE _ Co mpa ctFla sh _S ysA CE _WE N_ p in
Out
sg _sp lb_ clk
sg _sp lb_ clk sg _ sl_ rd co mp sg_ sp lb_ rst
SPLB_Rst
sg _ plb _ ab u s
PLB_ABus
sg _ sl_ rd da ck sg_ p lb_ p a va lid
sg _ sl_ rd db u s
PLB_PAValid
sg_ p lb _rn w
PLB_RNW
sg_ p lb_ wrdb u s
PLB_wrDBus
p ro c Sl_wrDAc k Sl_wrDAc k
p lb Rst
wrDB usRe g
wrDBus Reg Sl_addrAck
a d drA ck
p lb A Bu s
rd Co mp
Sl_rdComp
p lb P AV a lid wrDA ck
x lma x
p lb RNW
b a n kA d d r
RNWRe g
bank Addr
RNWReg
p lb WrDB us
rd Data
5 2 42 8 8
addrPref
a d d rPre f
Sl_addrAck
rdDA ck
Sl_rdDAc k
rdDB u s
Sl_rdDBus
lin e a rA d d r
linearAddr
wrDB u s
plb _ de co de
re ad _ b an k_o u t
rdDat a
b an kA dd r lin ea rA dd r
sm_OUT_ S EQ_ ad d r
OUT_SEQ_addr
RNWReg a dd rA ck do u t
sm_ S N_ NUM
SN_NUM_dout
xlmax
sm_ OUT_ SE Q_d in
OUT_SEQ_din
sm_ OUT_ SE Q_we
OUT_SEQ_we
sm_ P DH_ MATRIX
Fro m Re giste r >
sm_ P DHM_ INV
sm_A _ MA TRIX _ a dd r
A_MATRI X_addr
sm_ CON_ NUM do u t
PDH_MATRI X_dout
Fro m Re g iste r1
>
sm_ CH_ FA CTORS
d in
do u t
OUT_SEQ_dout
Fro m Re g iste r2 >
S ha re d Me mo ry >
CON_NUM_dout
a d dr d in
d o ut
A_MATRI X_dout
we Sh a red Me mo ry1
Fro m Re g iste r3
>
>
do u t
A_MATRI X_we
p lb _me mma p
PDHM_I NV_dout
we
do u t
A_MATRI X_din
sm_ A _MA TRIX _we
sm_ OUT_ S EQ
a dd r
sm_ A _ MA TRIX do u t
sm_ A _MA TRIX _d in
CH_FACTORS_dout
Fro m Re g iste r4 >
Figure 24.The structure of the EDK Processor block after the embedded MicroBlaze™ processor import and integration with the System Generator hardware model of the adaptive matrix inversion algorithm
6)
Next, we re- open the XPS project via the ISETM. Verify that the System Generator core (sg_plbiface_0) is available under in USER under the “Project Local pcores” in the IP
i.
Catalogue tab. Note that: The ISETM will indicate that the project is out of date. This warning is ignored here because the project cannot be compiled here since the
J. Adv. Res. Embed. Sys. 2015; 2(1):28-63.
53
processor directory is now empty (i.e. it does not exist any longer). Rather, the previous empty pcore directory now contains the precompiled peripheral. Thus, the project will be compiled from within MATLAB/ Simulink modeling and simulation environment. ii. Any change(s) made to the pre- designed MicroBlazeTM processor system after the import process will require re- importing the processor system into the System Generator model and the procedures in 2) to 4) must be repeated. iii. The instance of the System Generator model should now appear in the “USER” section under the IP Catalogue and also in the block diagram of the embedded processor system as
Akpan VA
shown in fig. 24. For the matrix inversion example considered here, the instance name is “sg_plbiface_0” (where 0 is the device identification number). Thus nothing needs to be done here. iv. Unlike in the pcore export process 27, 29 where input and output ports are manually assigned and the memory map addresses are manually generated, the instance of the System Generator hardware model (sg_plbiface_0) is automatically added with associated memory map address to the imported Microblaze™ processor system as can be seen under the Ports tab in fig. 25.
Figure 25.Software development using the XPS for initializing the software driver and implementing the embedded MicroBlaze processor system for the matrix inverse algorithm 7) Next, we right- click the sg_plbiface_0 pcore in the Bus Interfaces window on the System
J. Adv. Res. Embed. Sys. 2015; 2(1):28-63.
Akpan VA
Assembly View of the XPS and select “View API Documentation” which provides a complete guide for software development in order to initialize the embedded processor system and software drivers which reads from and writes to the embedded processor system peripherals via the pcore as well as memory map and address information for the pcore. 8) Using the EDK Processor sg_plbiface_0 API Documentation, whose three code fragments of major interest are given in Appendix C for space economy, the complete C code for initializing and implementing the combined embedded adaptive matrix inversion algorithmic coprocessor and the MicroBlaze™ processor system is developed. The sg_plbiface_0 co- processor software drivers perform the read/ write operations. The program is developed using the XPS “Add Software Application Project” to create a new project sub- directory called “Emb_Matrix_Inverse” with an empty source file (Emb_Matrix_Inverse.c) with same name using the “Add New Files” option. Following the same procedure, we create additional empty sg_plbiface.c, sg_plbiface.h and
54
xcope.h files. Next, we copy and paste the respective contents of the last three files from the pcore directory in the XPS. Finally, we right- click the Emb_Matrix_Inverse project and selected “Mark to Initialize BRAMs” and saved the project. The above description is shown in fig. 25 (Note: any attempt to recompile the project here will generate an error since the embedded MicroBlazeTM processor directory is empty). The complete C program (Emb_Matrix_Inverse.c) is given in Appendix D. Note that after creating and saving the software application, an empty MicroBlazeTM processor directory is recreated in the XPS directory. Moreover, as can be seen in fig. 25, there is an option for compilation with Linker Script (highlighted in blue) but we do not need to modify the project as the Linker Script will be generated in the Xilinx SDK during the FPGA programming after the compilation process by System Generator for DSP within the MATLAB/ Simulink modeling and simulation environment.
Figure 26.The combined embedded adaptive matrix inversion algorithmic co- processor (sg_plbiface_0) integrated with the imported embedded MicroBlaze™ processor system
J. Adv. Res. Embed. Sys. 2015; 2(1):28-63.
55
9) Next, we right- click the sg_plbiface_0 pcore in the Bus Interfaces window on the System Assembly View of the XPS and select “View API Documentation” which provides a complete guide for software development in order to initialize the embedded processor system and software drivers which reads from and writes to the embedded processor system peripherals via the pcore as well as memory map and address information for the pcore. 10) Using the EDK Processor sg_plbiface_0 API Documentation, whose three code fragments of major interest are given in Appendix C for space economy, the complete C code for initializing and implementing the combined embedded adaptive matrix inversion algorithmic coprocessor and the MicroBlaze™ processor system is developed. The sg_plbiface_0 co- processor software drivers perform the read/ write operations. The program is developed using the XPS “Add Software Application Project” to create a new project sub- directory called “Emb_Matrix_Inverse” with an empty source file (Emb_Matrix_Inverse.c) with same name using the “Add New Files” option. Following the same procedure, we create additional empty sg_plbiface.c, sg_plbiface.h and xcope.h files. Next, we copy and paste the respective contents of the last three files from the pcore directory in the XPS. Finally, we right- click the Emb_Matrix_Inverse project and selected
J. Adv. Res. Embed. Sys. 2015; 2(1):28-63.
Akpan VA
“Mark to Initialize BRAMs” and saved the project. The above description is shown in fig. 25 (Note: any attempt to recompile the project here will generate an error since the embedded MicroBlazeTM processor directory is empty). The complete C program (Emb_Matrix_Inverse.c) is given in Appendix D. Note that after creating and saving the software application, an empty MicroBlazeTM processor directory is recreated in the XPS directory. Moreover, as can be seen in fig. 25, there is an option for compilation with Linker Script (highlighted in blue) but we do not need to modify the project as the Linker Script will be generated in the Xilinx SDK during the FPGA programming after the compilation process by System Generator for DSP within the MATLAB/ Simulink modeling and simulation environment. 11) Next, the Block Diagram Image and the Design Summary Report are generated. The block diagram of the combined embedded adaptive matrix inversion algorithmic coprocessor integrated with the embedded MicroBlaze™ processor system is shown in fig. 26. The actual architecture and properties of the adaptive matrix inversion algorithmic co- processor pcore is shown in fig. 27 and is extracted for the preliminary generated design summary report. Finally, we save and close both the XPS and the ISETM projects.
Akpan VA
56
Figure 27.The architecture and properties of the adaptive matrix inversion algorithmic co- processor after integration with the imported embedded MicroBlaze™ processor system
Figure 28.The Xilinx SDK integrated development environment for software development, programming the Virtex-5 FX70T FPGA, implementation and performance verification of the combined embedded MicroBlaze™ processor and the adaptive matrix inverse algorithmic coprocessor system
J. Adv. Res. Embed. Sys. 2015; 2(1):28-63.
57
12) The System Generator model for the matrix inversion algorithm is re- opened from MATLAB/ Simulink modeling and simulation environment. Update the complete system model. Next, we double- click the System Generator token with the previously selected HDL Netlisting and click “Generate” to compile the complete project. Then we select the “Bitstream” option to generate the programming. In both case, the System Generator for DSP invokes the XPS which subsequently calls ISETM to compile the project and generate all the necessary files including the programming file (bitstream) with an SDK directory for the complete embedded system hardware. 13) Next, we export the hardware to Xilinx SDK directory. Here, we re- open the XPS from the ISETM by doubleclicking the Emb_Proc_Sys.xmp. Export the completed embedded hardware to the Xilinx SDK. Close
Akpan VA
the completed project i.e. the ISETM and XPS windows. 14) Open the Xilinx SDK, import the hardware, create a new software platform “Emb_Mat_Inv” and a new manage C application project “Emb_Mat_Inv_Project” as described in Section III-B and demonstrated in fig. 28. Here the compiled Emb_Matrix_Inverse.c appears and is selected under the “New Manage Make C Application Project” dialogue box. 15) Finally, we start again the pre- configured ml507_vin HyperTerminal session, connect and turn ON the ML507 Virtex-5 FXT FPGA, include the edkBmmFile_bd.bmm and emb_proc_sys.bit files, and program the FPGA by downloading the with the bitstream as shown in fig. 28. Running the complete hardware- software application in debugging mode produces the results displayed on the HyperTerminal window shown in fig. 29.
Figure 29.HyperTerminal display of the simulation result of the combined embedded adaptive matrix inversion algorithmic co- processor integrated with the MicroBlaze™ processor system running on Virtex-5 FX70T ML507 FPGA board MATLAB adaptive matrix inversion algorithm Discussions using Xilinx AccelDSP, and 2) the System In this article, an efficient adaptive matrix Generator hardware model which is obtained from inversion algorithm has been first presented that the System Generator block model together with adaptively checks and corrects an ill- conditioned other input- output as well as memory and matrix to obtain a well- conditioned positive interfacing peripherals using Xilinx System definite matrix before performing the matrix Generator for DSP and Simulink software. inversion. Next, a model- based technique has Before employing the System Generator been proposed for embedded processor system hardware model for embedded processor system design. It has been demonstrated how the modeldesign, it has been suggested and demonstrated based technique can be used to map and that FPGA- in- the- loop hardware co- simulation synthesized an algorithm from a higher level of be conduction to observe how well the hardware abstraction into an equivalent fixed- point model. model would perform when deployed on the Two models have been identified, namely: 1) FPGA. The results obtained from the FPGA- inSystem Generator block model which is obtained the- loop hardware co- simulation on Virtex-5 from the direct synthesis of the floating- point
J. Adv. Res. Embed. Sys. 2015; 2(1):28-63.
Akpan VA
FX70T ML507 FPGA development board showed good agreement with those obtained with the System Generator hardware model. Based on the acceptable performance of the System Generator hardware model, an adaptive matrix inversion algorithmic co- processor is generated from the System Generator hardware model which can readily be incorporated as a co- processing peripheral with a pre- designed embedded processor system residing on an FPGA. Embedding processors inside FPGAs can improve embedded systems performance for real- time applications; however, the choice of the processors, design considerations and implementation techniques may pose several challenges. Comprehensive design techniques for the hardware and software (see also Appendix C) portions as well as a standard approach for testing a soft- core embedded MicroBlaze™ processor system have been presented and demonstrated in this article. The main computational power of FPGAs are derived from the number of DSP48E slices 21, 22, 27 that can be partitioned and incorporated into a given design including the soft- core embedded MicroBlaze™ processor system design. Thus, comparing the hardware device utilization summary for the adaptive matrix inversion algorithmic co- processor shown in Appendix A where 42 DSP48Es slices out of 128 (about 32%) have been used with the 4 DSP48Es slices out of 128 (about 3%) used by soft- core embedded MicroBlaze™ processor system shown in Appendix B; it is obvious that the adaptive matrix inversion algorithmic co- processor is indeed an efficient processor as well. To enhance the computational capabilities of the adaptive matrix inversion algorithmic coprocessor, the algorithmic co- processor is integrated with the soft- core embedded MicroBlaze™ processor system. Rather than exporting the adaptive matrix inversion algorithmic co- processor into the pre- designed soft- core embedded MicroBlaze™ processor system to form a standalone embedded dualprocessor system, the reverse is the case so that only computationally intensive tasks can be deployed to the FPGA. The integrated dualprocessor system is shown in fig. 24 whereas the
58
block diagram is shown in fig. 26. Please note here that even though the embedded MicroBlaze™ processor system is imported into and integrated with the adaptive matrix inversion algorithmic coprocessor, the actual implementation is carried out in the Xilinx Platform Studio XPS while the complete simulated is run from the System Generator for DSP and Simulink environment. The results obtained from the FPGA- in- the- loop simulation of the embedded dual- processor system running on Virtex-5 FXT ML507 FPGA board gives identical results as those obtained from the floating- point MALAB adaptive matrix inversion algorithm. However, no timing analysis were carried out in this work to ascertain which of the five approaches gives reduced computation time for the different versions of the adaptive matrix inversion algorithm; i.e., the floating- point MATLAB algorithm, the fixed- point C++ algorithm, the System Generator hardware model, FPGA- inthe- loop hardware co- simulation, and the integrated dual- processor system (the combined embedded MicroBlaze™ processor system and adaptive matrix inversion algorithmic coprocessor). Furthermore, it may also be of interest to compare the computation times of the adaptive matrix inversion algorithm implementation when the adaptive matrix inversion algorithmic coprocessor is exported to the embedded processor system 29 and vice versa as in the current work. The major advantage of the proposed modelbased design approach is that a physical system can be modeled without need for the hardware architecture and implementation routines provided the algorithms describing the system are synthesizable. Finally, the technique reported in this article employs two set of processors, namely: 1) the embedded dual- processors due the embedded MicroBlaze™ and the hardware algorithmic coprocessor, and 2) the dual- processors on the Intel ® Core™ 2 CPU @ 1.86GHz host (development) computer. In this way, the adaptive matrix inversion algorithm, which is assumed to be computationally intensive, is implemented on the embedded dual- processors while the inputs and outputs are read from and written to the host computer.
J. Adv. Res. Embed. Sys. 2015; 2(1):28-63.
59
Akpan VA
Appendix A.A portion of the Xilinx XFLOW synthesis and implementation report showing the device utilization summary for the System Generator hardware model of the adaptive matrix inversion algorithm generated by Xilinx BITGEN.
J. Adv. Res. Embed. Sys. 2015; 2(1):28-63.
Akpan VA
60
Appendix B.The Embedded MicroBlaze™ Processor System Design Summary Generated by the Xilinx ISETM Foundation.
J. Adv. Res. Embed. Sys. 2015; 2(1):28-63.
61
Akpan VA
Appendix C. EDK Processor sg_plbiface API Documentation for Software Development and Drivers Initialization.
Appendix C-1.EDK Processor – sg_plbiface API
Appendix C-2.The sg_plbiface shared memory settings and software driver functions.
J. Adv. Res. Embed. Sys. 2015; 2(1):28-63.
Akpan VA
62
Appendix C-3.The sg_plbiface driver performance optimization
Appendix D.The C program for the implementation and performance verification of the combined embedded adaptive matrix inversion algorithmic co- processor integrated with the imported MicroBlaze™ processor system.
J. Adv. Res. Embed. Sys. 2015; 2(1):28-63.
63
Akpan VA
References 1. Monmasson E, Cirstea MN. FPGAs design methodology for industrial control systems – A review. IEEE Transactions on Industrial Electronics Aug 2011; 54(4): 1824 – 42. 2. Cardenas EO, Romero-Troncoso RJ. MLP neural network and on-line backpropagation learning implementation in a low-cost FPGA, In Proc. of the 18 ACM Great Lakes Symposium on VLSI (GLSVLSI’08), Orlando, Florida, USA, May 4 – 6, 2008: 333 – 38. 3. Malinowski A, Yu H. Comparison of embedded system design for industrial applications. IEEE Transactions on Industrial Informatics May 2011; 7(2): 244 – 54. 4. Monmasson E, Idkhajine L, Cirstea MN et al. FPGAs in industrial control applications. IEEE Transactions on Industrial Informatics May 2011; 7(2): 224 – 42. 5. Akpan VA. Hard and soft embedded FPGA processor systems design: Design considerations and performance comparisons. International Journal of Engineering and Technology 2012; 2: 21. 6. Fletcher BH. FPGA embedded processors: Revealing true system performance. Embedded Systems Conference, San Francisco, 2005: 1 – 18. 7. Pellerin D, Bodenner R. Using FPGAs as coprocessors for DSP and image processing Embedded Training Program, Embedded Systems Conference, ESC-263, San Jose – USA, 2007: 1 – 13. 8. Moertti G. System-level design merits a closer look: the complexity of today's designs requires system-level, EDN Asia, February, 01 2002, pp. 22-28. Available from: http://www.ednasia.com/article-1129systemleveldesignmeritsacloserlookAsia.html. 9. Gullapalli V, Shi K. Hierarchical design techniques, Synopsys, January 2004: 1 – 13. Available from: www.synopsys.com. 10. Martin G. The future of high-level modelling and system level design: Some possible methodology scenarios. Cadence Design Systems. Available from: http://www.eda.org/edps/edp02/PAPERS/edp 02-s11.pdf. 11. Akpan VA. An Introductory Note on FPGABased Embedded System Design Technologies: With An Overview of the Xilinx Systems Design Tools, Technical Report, Department of Electrical and Computer Engineering, Faculty of Engineering, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece, November, 2009. Available from: http://users.auth.gr/~iosamar/technicalreports. htm. 12. Akpan VA. Model-based embedded-processor systems design methodologies: Modeling, syntheses, implementation and validation. African Journal of Computing & ICT 2012; 5(1): 1 – 26. Available from: http://www.ajocict.net/uploads/Final_Akpan_
J. Adv. Res. Embed. Sys. 2015; 2(1):28-63.
_Model-Based_FPGA_-_2011.pdf. 13. Kiusalaas J. Numerical Methods in Engineering with MATLAB, Cambridge University Press, New York, U.S.A., 2005. 14. Marquardt DW. An algorithm for leastsquares estimation of nonlinear parameters. J Soc Ind Appl Math 1963; 11(2): 431-41. 15. Akpan VA, Hassapis GD. Training dynamic feedforward neural networks for online nonlinear model identification and control applications. International Reviews of Automatic Control: Theory & Applications 2011; 4(3): 335 – 50. 16. Tsatsomeros MJ, Li L. A recursive test for Pmatrices. BIT 2000; 40(2): 410 – 14. 17. The MathWorks Inc., MATLAB & Simulink R2009a, Natick, USA. Available from: www.mathworks.com. 18. Xilinx AccelsDSP Style, MATLAB for Synthesis: Style Guide, UG637 (v11.4), December 2, 2009: 1–232. Available from: www.xilinx.com. 19. Xilinx AccelDSP Synthesis Tool: User Guide, UG634 (v11.4), December 2, 2009: 1 – 222. Available from: www.xilinx.com. 20. AccelWare DSP IP Toolkits: User Guide, Release 9.2.00, August, 2007: 1 – 290. 21. Xilinx System Generator for DSP, User Guide, UG640 (v12.1), April 19, 2010: 1 – 414. Available from: Available from: www.xilinx.com. 22. Xilinx System Generator for DSP, Reference Guide, UG638 (v11.4), December 2, 2009: 1 – 522. Available from: www.xilinx.com. 23. EDK Concepts, Tools, and Techniques: A hands-On Guide to Effective Embedded System Design, EDK 11.4, 114 pp. 2009. Available from: http://www.xilinx.com/support/documentation /dt_edk_edk11-1.htm. 24. Platform Specification Format Reference Manual: Embedded Development Kit (EDK), v12.1, April 19, 2010, pp. 1 – 140. Available from: http://www.xilinx.com/support/documentation /sw_manuals/ xilinx12_1/psf_rm.pdf. 25. Xilinx ISE In-Dept Tutorial, v12.1, April 19, 2010: 1 – 152. 26. ISE Simulator (ISim): In-Dept Tutorial, v1.0, April 27, 2009: 1 – 62. 27. Akpan VA. Development of new model-based adaptive predictive control algorithms and their implementation on real-time embedded systems, Ph.D. Dissertation, Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Greece, Jul 2011, 517 pages. Available from: http://invenio.lib.auth.gr/record/127274/files/ GRI-2011-7292.pdf. 28. Data2MEM: User Guide (2009), UG658, Version 1.0, April 27, 2009: 1 – 44. Available from: www.xilinx.com. 29. Akpan VA. An FPGA realization of integrated embedded multi-processors system: A hardware-software co-design approach. International Journal of Computer Science and Emerging Technologies 2013; 4, 20 pages.