Implementation Scenario for Teaching Partial ... - Supelec

3 downloads 70959 Views 312KB Size Report
core and API). ... Xilinx SDK environment is used in the last step to develop the C code ... of Cognitive Radio Equipments Management APIs", Journal of. Network ...
Implementation Scenario for Teaching Partial Reconfiguration of FPGA Pierre Leray, Amor Nafkha, Christophe Moy SUPELEC/IETR Avenue de la Boulais, CS 47601, 35576, Cesson-Sévigné Cedex, France [email protected]

Abstract — We present in this paper a lab on partial reconfiguration (PR) of FPGA for a video application. This lab is dedicated to last year engineering students. The implementation target is a Xilinx Virtex5 of a ML506 design kit board. The structure of the proposed design, as well as the designing steps and the obtained results are detailled. This lab is based on the research done by the authors in the domain of software radio and cognitive radio during last decade. Index terms— partial reconfiguration of FPGA, Virtex, ICAP, education

I. INTRODUCTION The lab presented in this paper is an heritage of the research done by the authors in the domain of software radio [1] [2] and cognitive radio [3] in their research work. Future flexible radio operation indeed implies the use of heterogeneous processing units such as DSP, FPGA, GPP, and ASIC. But we claimed that efficiency (in terms of processing power, power consumption, etc.) can only be guaranteed if a management dedicated to reconfiguration is especially added to radio processing [4][5][6][7]. Moreover, the requirement for local and fast reconfiguration was also identified at that time [8]. As processor reconfiguration is not a breakthrough, a special focus has been made on hardware side, namely FPGA, in order to complete the heterogenous management capabilities: from first experiminents [9], to realistic radio algorithms implementation [10] and system integration [11]. In the reconfigurable hardware domain, we speak about partial reconfiguration (PR) of FPGA [12]. Efficiency in terms of reconfiguration speed is a crucial feature of partial reconfiguration, and even a condition of pertinence for software radio and cognitive radio community. That is the reason why we particularly studied all the possible means to reduce reconfiguration time through two axes: decrease partial bistream size on

the one hand, and increase transfer speed to reconfiguration plan on the other hand. This implies for instance: - to priviledge parameterization techniques at design time[13], - to speed-up ICAP interface at the maximum technological capabilities [14]. Software radio is an application context that does not differ so much from any other real-time embedded electronics domain. This makes partial reconfiguration of FPGA (combined with a management architecture) usefull for many other applications contexts and in particular image and video processing domain. This has always been a common interest we also addressed with video processing researchers both for joint radio and video contexts [15][16], and only video processing alone. For instance, the TransMedi@ project [17] of the Brittany Region pole of excellence Images and Networks (Images et Réseaux) adressed the FPGA PR solution for video transcoding in the infrastructure servers. We now believe it is time to spread PR technology in the industrial domain for applications and consequently it is time for education of future engineers. The paper is organized as follows. Next part describes the project we propose as a lab to last year students, just before they graduate for engineering diploma. Part III exposes how reconfiguration management is deployed in the context of partial reconfiguration of FPGA. A focus on the design flow for partial reconfiguration is summed-up in part IV. Finally, implementation results are given in part V, as well as concluding remarks in a last section.

II. PROJECT DESCRIPTION The student lab consists in implementing a flexible realtime video processing. The video processing is changed on-the-fly by dynamically reconfiguring some FPGA processing area.

Xilinx ML506 board Configuration Memory FPGA Virtex5Virtex5-SX50

Reconfiguration Manager

A. Hardware platform architecture

²

Video Processing Video_128_in

The host PC plays three roles: - development platform, - video server, - highest level reconfiguration manager. The screen is directly connected to the kit through a video connector. A serial link connects the host PC and the board for reconfiguration management needs in order to: - load partial bistreams into configuration memory at initialization, - send reconfiguration orders for on-the-fly video processing adaptation. B. Functional architecture The FPGA processing is made of two distinct pieces. On the one hand, the video processing chain we’ll detail in this paragraph. On the other hand is the management architecture to be added to the processing in order to enable dynamic and correct reconfiguration, as defined in our research work on management architectures [7]. This will be described in the special context of FPGA in part III of this paper.

ICAP Controller

MicroBlaze

A transcoding video processing is performed in a FPGA. The hardware platform is made of a Xilinx ML506 design kit, a host PC and a screen for display as shown in Figure 1.

Reconfigurable Processing Unit

Video Coder

Video_256_out

DVI Controller

RGB Video source

bitstreams Figure 1 – System architecture

The goal is to dynamically change the PU operation without interrupting the video stream. C. Video application

Module “video_128_in” of Figure 1 receives data from the video coder of the design kit and stores it in the embedded memory inside the FPGA (input picture memory) of 128 by 128 pixels.

The goal is to perfom a video transcoding on a 60 frames per second video stream. This kind of application can be met in the data infrastructure context where the video stream could be compressed in order to fit with bandwidth requirements in a given area. Another need is also to transcode a given high quality video stream into several lower quality streams, as from a HD TV broadcast stream to a mobile phone format.

Then a processing unit (PU) performs the video processing (see next section) and stores the result in the embedded memory inside the FPGA (output picture memory) of 256 by 256 pixels.

We propose here to switch between two different kinds of transcoding. Either enhance the quality of the input video stream, or broadcast the input video stream towards 4 lower quality receivers.

Module “video_256_out” of Figure 1 sends data from the FPGA to the DVI (Digital Visual Interface) controller of the design kit.

In the first case, the algorithm used to change a 128x128 pixels data stream to a 256x256 pixels data stream is the H-264 semi-pixel upscaling of Figure 2.

B. MicroBlaze and its software drivers a1 = E – 5*F + 20*G +20*H –5*I + J a = Clip (a1 + 16 >>5) E

F

G

a

I

H

J

b

K

j1= a – 5*b + 20*c1 + 20*d1 – 5*e + f j = Clip (j1 + 512 >>10) L

P

R

c

x

j

M

d

x1 = E – 5*K + 20*L +20*M –5*N + O x = Clip (x1 + 16 >> 5) N

e

O

f

Figure 2 – H-264 semi-pixel upscaling schematic view

The second context only consists in duplicating 4 times the input data stream. III. PR MANAGEMENT ARCHITECTURE

A FPGA configuration is done by loading in the configuration plan binary data, which are called a bitstream. Changing the FPGA operation (partially or totally) means reloading a new bistream. Either it reconfigures the whole FPGA (total bitstream), or only a sub-part of the FPGA and we speak about partial reconfiguration. In this design we chose to use a MicroBlaze softcore to perform the bitstream loading from the memory to the ICAP interface, as shown in Figure 3. Bitstreams are stored in an off-chip (external to the FPGA) memory of the design kit board. In this perspective, students had to develop the soft driver of top right Figure 3 to be executed by the MicroBlaze. This driver’s task consists in controlling the ICAP interface in order to perform partial reconfiguration. The MicroBlaze is the bitstream table manager in order to select the correct bistream depending on the reconfiguration order it receives. This reconfiguration order is typically coming from higher layers of the management architecture, as exposed in [7] in a cognitive radio context.

A. PR design approach Two modes are possible for dynamic reconfiguration, depending on the reconfiguration initiator. Either reconfiguration is done by an external processor (to the FPGA) through JTAG, serial port or SelectMap interface. Or the partial reconfiguration is performed by a core processor (soft core MicroBlaze for instance) embedded in the FPGA to be reconfigured. We speak then about self-reconfiguration. The embedded processor here reaches configuration plan through ICAP interface in this case, which enables to obtain best reconfiguration speed as shown in Table 1. Configuration Mode

Max Clock Rate

Data Width

Max Bandwidth

SelectMap / ICAP

100 MHz

32-bit

3.2 Gbps

Serial Mode

100 MHz

1-bit

100 Mbps

JTAG

66 MHz

1-bit

66 Mbps

Table 1 – Reconfiguration throughput for Virtex5 family [18]

We choose in this student lab to have a self reconfiguring approach for FPGA (also called autoreconfiguration sometimes) [5].

Soft driver

MicroBlaze platform MicroBlaze CPU PLB

Wait for Reconfiguration order & Config busy FALSE Read Bitstream attributes in Configuration Table Send Bitstream attributes to ICAP Controller

ICAP Controller Length register Address register Controller

Hard driver Wait for Length = 0

32

Config busy TRUE

ICAP Primitive 400 MB/s

32 Configuration Memory

Direct Transfert Memory to ICAP Address ++ ; Length -Length == 0 Config busy FALSE

Figure 3 – management

Resources

supporting

reconfiguration

C. ICAP controller The bistream transfer from the off-chip memory to the ICAP is automatically proposed by Xiling ISE development tool. This is performed in a software approach executed by the MicroBlaze (XPS HWICAP core and API). The performance in terms of data

throughput to the ICAP is far from the theoretical technological capabilities of 3.2 Gbps (see Table 1). That is the reason why we propose a DMA (Direct Memory Access) interface between off-chip memory and ICAP primitive, which permits to reach the theoretical transfer bandwidth [14]. Another task to be developped by the students is to build in VHDL the hard ICAP controller implementing the state macine down right Figure 3 for that purpose. D. Loading procedure The lab first implementing step consists for the students in loading the global bitstream (developped as shown in part IV) in order to implement the MicroBlaze, the interfaces, and the video processing chain with a default PU. Student then develop and execute a C code program to download partial bitstreams (developped as shown in part IV) from the host PC to the off-chip configuration memory on the design kit board. The MicroBlaze is ready and then waits for a reconfiguration order. In this lab, the reconfiguration orders come from the host. The user sends an ID tag to select a configuration through an hyperterminal window. The user plays here the role of the manager of the equipment but we could imagine autonomous decision schemes [7].

First step consists in describing and synthesizing all the design functional blocks. PR flow imposes a modularbased design approach. It requires to separately synthesizing each configuration of the reconfigurable PU. The resulting files represent at gate level (netlist) top-level design, and all modules present in the global architecture. Second step consists in building the target device floorplan while specifying the different FPGA areas where the different modules are allocated. The « Set Reconfigurable » attribute is given to parts where reconfigurable modules (PU) are. PlanAhead is a graphical tooling to help floorplan design. It also handles final place and route, as well as configuration file (bitstreams) generation. In PR mode, PlanAhead enables to manage partial reconfiguration while separating static from dynamic areas. Each partial bitstream for each version of a dynamically reconfigurable area is also generated by PlanAhead. Design Description

Top Module

Static Module

- MicroBlaze platform - ICAP Controller - Video Interfaces

Reconfigurable Module PU Configuration A

Configuration B

Synthesis Netlists Floorplanning Draw Reconfigurable Partition

IV. DESIGN FLOW Partial reconfiguration implementation requires a specific design methodology. Students had to make their design in 3 steps: - description and synthesis of the hardware platform, - design implementation and configurations bitstreams generation, - develop C code for the MicroBlaze.

Specify any configuration Floorplan Place/Route/Generate Place/Route/Generate Bitstreams Run implementation of Static and Reconfigurable Modules for each configuration

Bitstreams

Figure 4 – Hardware design flow steps

PR design flow relies on two Xilinx design environments, as shown in Figure 4 : - Xilinx ISE Project Navigator and EDK Xilinx Platform Studio for hierarchical/modular design description and synthesis, - PlanAhead for the floorplan design, reconfigurable areas definition, until the global and partial bitstreams generation.

Xilinx SDK environment is used in the last step to develop the C code to be executed by the MicroBlaze soft core. Students must program the MicroBlaze to make it - load partial bitstreams from the host to the configuration memory, - control reconfiguration when an order is transmitted by the management hierarchy (host).

VI. CONCLUSIONS AND FUTURE WORK V. IMPLEMENTATION RESULTS A. Application The design kit board comprises one Xilix Virtex5SX50T FPGA clocked at 100 MHz, whose main chararacteristics are: - 32640 slices, - 132 Blocks RAM of 36kb (4752 kb), - 288 DSP blocks, - global bitstream size: 2.5 MBytes. Upscaling PU is clocked at 200 MHz for a performance of 200 Mpixels/s. Its complexity is 1241 slices and the corresponding partial bitstream size is 57 kBytes.

This paper shows how a partial reconfiguration design and real experimentation is performed in a student lab. This lab is based on research activities and previous research results of the professors in the domain of software and cognitive radio. However partial reconfiguration technology may be usefull and used in many other domains requiring both high performance (in terms of processing power) and flexibility. Under the condition of considering reconfiguration management as important as the processing itslef, partial reconfiguration of FPGA opens a new era in reconfigurable computing, mixing both hardware performance and software flexibility. Students may experiment such a new paradigm with this lab and then disseminate this technology to the industry after graduating.

B. Reconfiguration Reconfiguration performance is here considered in terms of reconfiguration speed or reconfiguration time. Depending on each application contraints a good performance may be achieved at different orders. The idea is that reconfiguration overhead must be negligible compared to PU’s processing duration. In 60 frames per second video stream, each picture or frame is displayed every 17 ms. It has been measured that the reconfiguration of the Upscaling PU takes 150 µs. Consequently, we can consider that reconfiguration is not adding some unacceptable overhead compared to processing load. This illustrates the pertinence of FPGA PR for ultra-fast adaptation in real-time systems.

VII. ACKNOWLEDGMENT Authors thank Xilinx for their support for teaching material (software licenses and tutorials), and for the early access they afforded to PlanAhead in the past [19]. VIII. REFERENCES [1] [2]

[3]

[4]

The reconfiguration of 150 µs corresponds to a reconfiguration throughput of 3,04 Gbps: [5] 3

R throughput =

57.10 × 8 = 3,04.10 9 bps 150.10 −6

In fact the maximum technological throughput of 3,2 Gbps on Virtex5 devices is reached with an initial overhead of 7,5 µs. The effective download time is consequently of 142,5 µs, for a total reconfiguration time of 150 µs and then: R throughput =

57.103 × 8 = 3,2.109 bps 142,5.10 −6

This result has been published in Reconfigurable Computing Conference in 2009 [14].

[6]

[7]

[8]

[9]

[10]

Mitola J., “The Software Radio Architecture,” IEEE Comms. Mag., vol. 33, no. 5, pp. 26-38, May 1995 Kountouris A., Moy C., and Rambaud L., "Reconfigurability: A Key Property in Software Radio Systems", First Karlshruhe Workshop on Software Radios, Germany, 29-30 Mar. 2000 Mitola J., “Cognitive Radio: An Integrated Agent Architecture for Software Defined Radio”, Ph.D. dis. Royal Inst. of Tech., Sweden, 2000 Kountouris A. and Moy C., "Reconfiguration in Software Radio systems", Karlsruhe Workshop on Software Radio, Germany, Mar. 2002 Delahaye J.-P., Palicot J., Leray P., "A Hierarchical Modeling Approach in Software Defined Radio System Design," SIPS 2005, Athens-Greece, Nov. 2005. Godard L., Moy C. and Palicot J., "From a Configuration Management to a Cognitive Radio Management of SDR Systems", CrownCom'06, 8-10 June 2006, Mykonos, Greece Moy C., "High-Level Design Approach for the Specification of Cognitive Radio Equipments Management APIs", Journal of Network and System Management, vol. 18, n° 1, pp. 64-96, Mar. 2010 Delahaye J.P., Leray P., Moy C. and Palicot J., "Managing Dynamic Partial Reconfiguration on Heterogeneous SDR Platforms", SDR Forum Technical Conference’05, Anaheim (USA), November 2005 Delahaye J.P., Gogniat G., Roland C., Bomel P., "Software Radio and Dynamic Reconfiguration on a DSP/FPGA Platform," 3rd Karlsruhe Workshop on Software Radios, proc. pp 143-151, Karlsruhe Germany, March 17-18 2004. Delahaye J.P., Palicot J., Moy C., Leray P., “Partial Reconfiguration of FPGAs for Dynamical Reconfiguration of a

Software Radio Platform”, IST Mobile and Wireless Communications Summit'07, 1-5 July 2007, Budapest, Hungary [11]

Delorme J., Martin J., Nafkha A., Moy C., Clermidy F., Leray P., Palicot J., “A FPGA partial reconfiguration design approach for cognitive radio based on NoC architecture”, IEEE New Circuits and Systems Conference, NEWCAS, 22-25 June 2008, Montréal, Canada

[12]

"Virtex Series Configuration Architecture User Guide," Xilinx, Inc., 2100 Logic Drive, San Jose, CA 95124,XAPP151 (v1.6) March 24, 2003

[13]

Gul S.T., Alaus L., Noguet D., Moy C. and Palicot J., "The Common Operator Technique: An Optimization Process to Identify and Design a Set of Common Operators to Perform SDR Equipment", ICT Mobile Summit’09, 10-12 June 2009, Santander, Spain

[14]

Delorme J., Nafkha A., Leray P., Moy C., “New OPBHWICAP interface for real-time Partial reconfiguration of FPGA”, International Conference on ReConFigurable Computing and FPGAs, ReConFig'09, Cancun, Mexico, 9-11 Dec 2009

[15]

Raulet M., Urban F., Nezan J.F., Moy C., Deforges O., Sorel Y., "Rapid Prototyping for Heterogeneous Multicomponent Systems: an MPEG-4 Stream Over an UMTS Communication Link", Eurasip Journal on Applied Signal Processing – special issue on Design Methods for DSP Systems, Kluwer Academic Publishers ; Volume 2006 (2006), Article ID 64369

[16]

Moy C., Raulet M., "High-Level Design for Ultra-Fast Software Defined Radio Prototyping on Multi-Processors Heterogeneous Platforms", Journal on Advances in Electronics and Telecommunications – Radio Communication Series: special issue on Recent Advances and Future Trends in Wireless Communications, Vol. 1, n° 1, pp. 67-85, April 2010

[17] [18]

http://hpcas.enstb.org/transmedia

Xilinx tutorial presentation – “Introduction to Partial Reconfiguration Methodology”, 2010 [19] Xilinx, Early access partial reconfiguration user guide,ug208, 2006.

Suggest Documents