A Reconfigurable Hardware Platform Implementation ...

14 downloads 0 Views 1MB Size Report
Ahmed Kamaleldin1, Sherif Hosny2, Khaled Mohamed3, Mostafa Gamal1, Abdelrhman Hussien1, ... {ah.kamal[email protected], [email protected], ...
A Reconfigurable Hardware Platform Implementation for Software Defined Radio using Dynamic Partial Reconfiguration on Xilinx Zynq FPGA Ahmed Kamaleldin1, Sherif Hosny2, Khaled Mohamed3, Mostafa Gamal1, Abdelrhman Hussien1, Eslam Elnader1, Ahmed Shalash1, Abdelfattah M. Obeid4, Yehea Ismail5, and Hassan Mostafa1,5 1

Electronics and Communications Engineering Department, Cairo University, Giza 12613, Egypt. 2 Mentor Graphics. 3 Electronics Department, Faculty of Information Engineering and Technology, German University in Cairo, Cairo, Egypt. 4 King Abdulaziz City for Science and Technology (KACST), Riyadh, Saudi Arabia. 5 Center for Nano-electronics & Devices, American University in Cairo & Zewail City for Science and Technology, Cairo, Egypt. {[email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]} Abstract— Dynamic Partial Reconfiguration (DPR) can be used efficiently to implement a reconfigurable hardware platform for Software Defined Radio system that supports multiple wireless standards. This method optimizes several design metrics such as hardware resources, power, and reconfiguration time. Nevertheless, partitioning is a challengeable issue in the DPR flow. In this work, we implement two design approaches: one with single-partition approach and another with multi-partitions using a partitioning algorithm, introduced in the literature. A complete DPR design flow is discussed. Also, a comparison between the two approaches is evaluated on a Xilinx Zynq FPGA. It is observed that the multi-partitions-based approach gives 16% less reconfiguration time while reducing the reconfiguration area and power consumption by 4.5% and 9.8% respectively. Keywords—Reconfigurable Architecture, Dynamic Partial Reconfiguration, Software Defined Radio, Partitioning.

I. INTRODUCTION Modern wireless communication systems are software defined systems that support multiple wireless standards and are implemented on reconfigurable hardware platforms [1]. Software Defined Radio (SDR) provides the flexibility to reconfigure the communication chain according to the required wireless standard reusing the same hardware physical resources. Field Programmable Gate Arrays (FPGA) is a promising reconfigurable hardware platform to implement the SDR system by using the DPR techniques to achieve the requirements of fast reconfigurability, hardware optimization, and low power consumption. DPR provides the advantage to reconfigure a portion of the FPGA device at runtime while the rest of FPGA remains active. Reconfiguration time and resources allocation are the main factors that determine the system performance. In this paper, two reconfigurable hardware implementations for SDR system based on two different ways of DPR partitionings techniques (resources allocation) targeting a Zynq FPGA. The proposed reconfigurable hardware for the SDR system supports three wireless physical layer transmitter chains 3G, WIFI, and LTE that are used as a case study. A performance evaluation between the two DPR implementations is done based on the total reconfiguration time required to switch between the wireless

978-1-5090-6389-5/17/$31.00 ©2017 IEEE

communication chains, the total resources utilization and the total power consumption by the proposed designs. This paper is organized as follows: Section II gives a background on runtime DPR technique and a previous work review. Section III gives a short introduction to the SDR and the communication standards supported by the proposed reconfigurable hardware platform. In section IV, the DPR implementation flow is introduced. Experimental results and performance evaluation are described in section V. Finally, Section VI draws the paper conclusion. II. BACKGROUND AND PREVIOUS WORK Commercial available SDR system platforms are reprogrammed by software routines that run on a fixed hardware platforms like General-Purpose Processors (GPPs) or Digital Signal Processors (DSPs). Obviously, GPPs and DSPs are not a suitable reconfigurable hardware platforms for the high data rate and low power constraints required by the baseband signal processing for modern SDR System. FPGAs can support high data rates and high bandwidth for current and next generations of wireless communication standards with a reduction in power consumption and reasonable hardware resources utilization. Runtime DPR offers the needed hardware flexibility to reconfigure the SDR system from a wireless standard to another reusing the same FPGA hardware resources. DPR design flow requires partitioning the system into a static part and a dynamic part [2]. The dynamic part or the reconfigurable part consists of a number of reconfigurable modules (RMs), while the static part contains the static modules that are not changed during the reconfiguration. The dynamic part contains a number of Reconfigurable Regions (RRs). Each RR has a set of RMs which can be swapped during runtime without interruption. Several previous works have discussed the advantages of applying DPR on the implementation of a reconfigurable hardware platform for SDR. A high-speed reconfiguration time dynamic cognitive radios implementation is presented in [3] using the Zynq FPGA. The proposed system uses the Programmable Logic (PL) to implement the baseband and the ARM processor for the MAC layer. Our recent published contribution presented in [4] is a multi-standard DPR SDR implementation that is proposed to emphasize the feasibility of DPR on the implementation of SDR.

1540

The proposed system does not consider a fully SDR terminal (Tx/Rx) and targeting an old Xilinx FPGA series (Virtex-5). III. SOFTWARE DEFINED RADIO AND COMMUNICATION STANDARDS USED AS A CASE STUDY 3G, WIFI, and LTE are widely used wireless standards that are used in this paper as a case study to evaluate the applying of DPR in the implementation of a reconfigurable hardware platform for SDR system. As each standard has its own hardware resources, implementing all these standards in parallel results in larger silicon area and higher power consumption [6]. SDR refers to the methodologies and techniques that allow the reconfiguration of the wireless communication system without the changing of the hardware system elements [1]. SDR provides the ability to adapt future wireless standards using the same hardware platform. Three wireless transmitter chains (3G, WIFI, and LTE) are supported by the proposed reconfigurable SDR system, where the communication blocks have different specifications according to the mode of operation for each chain. Fig.1 shows the transmitter blocks of the three standards and their resources requirement given by Xilinx Synthesis Tool. Wireless standards are almost similar in basic blocks. The simple communication channel is formed of channel coding blocks, interleavers and code block for error correction, modulation blocks, spreading and scrambling blocks. Spreading and Scrambling are widely used for purpose of increasing the bandwidth of the signal and avoid the interference between different channels. The three standards used as case study are: • The 3G is the wireless mobile third generation standard. 3G uses the Wideband Code Division Multiple Access (WCDMA) radio technology to offer greater spectral efficiency and high data rate uplink and downlink and simultaneous voice and data. • WIFI is a fixed wireless communication chain based on the IEEE 802.11a standard. It allows the connection wirelessly to the internet. The WIFI physical layer uses Orthogonal Frequency Division Multiplexing (OFDM) as a modulation format and to reduce the interference effect. • LTE or 4G technology is the fourth generation in wireless mobile standards. The LTE uplink transmitter uses Single Carrier Frequency Division Multiple Accesses (SC-FDMA) for its lower peak-to-average power ratio (PAPR) results in a power cost reduction in transmitter terminal.

IV. SYSTEM IMPLEMENTATION The system is implemented using Xilinx Zynq XC7Z020LG484-1 hybrid FPGA on ZC702 evaluation board [5]. The design flow is based on the regular Xilinx DPR design flow [2] except for the step of partitioning. A. Design Parameters There are two DPR implementation approaches to be implemented. A first trivial implementation is to consider only one large RR that holds the hardware resources required by the largest configuration (largest wireless standard). Then, this RR is dynamically reconfigured with a wireless standard at a time. This is known as single region partitioning. Our second approach is to split this RR into several ones. The two DPR implementations parameters now are 1) number of RRs and 2) how to split different standard blocks among these RRs optimally. B. Design Flow 1) Synthesizing Separate SDR Blocks Xilinx Vivado 2015.2 synthesis tool is used to synthesize hardware blocks of each standard. Each wireless standard has its own hardware blocks with different resources, as shown in Fig. 1. The hardware blocks are given to the synthesis tool as a set of HDL files to determine the numbers and types of resources requirements on the FPGA. 2) Partitioning The partitioning constraints on Xilinx FPGA can be modeled as in [2] as follows: • The RR is a fixed size area located on a specific location on the FPGA floorplan with physical constraints that could not be changed during runtime or reconfiguration. • The input/output ports of a RR have a fixed numbers and widths that could not be changed during reconfiguration. • Routing or the inter-partitions connections between different RRs is located in the static region of the design.

LTE CRC-24 bit 49 Slice

LTESegmentation 170 Slice 0.5 BRAM

Turbo encoder 1:3 155 Slice 1.5 BRAM 3 DSP

LTE-Code Block Concatenation 41 slice 2 BRAM

LTE-Scrambler 130 Slice 4.5 BRAM

Mapper (BPSK) 30 Slice

IFFT 584 Slice 0.5 BRAM 6 DSP

Preamble 46 Slice 0.5 BRAM

3G-Code Block Concatenation 31 Slice 2 BRAM

3G-Interleaver 336 Slice 2.5 BRAM 2 DSP

3G-Spreading Scrambling 115 Slice

Rate Matching+ Interleaver

860 Slice 4 BRAM 1 DSP

Mapper (QPSK) 34 Slice 0.5 BRAM

WIFI WIFIScrambler 46 Slice

WIFI-Conv. Encoder Rate 1:2 28 Slice

WIFI-Interleaver 45 Slice 2 BRAM 1DSP

SC-FDMA (64 point DFT+256 point IFFT) 1593 Slice 5 BRAM 25 DSP

3G CRC-8bit 34 slice

The transmitter blocks of the three standards consist of reconfigurable modules (RMs) that represent the different communication blocks for each standard. Each standard has a unique set of RMs that is not common among the three standards. In the proposed reconfigurable hardware SDR implementation RMs are allocated to Reconfigurable Regions (RRs) on the FPGA floorplan. The SDR system uses DPR to switch between the three transmitter chains during runtime.

3G-Segmentation 907 Slice 0.5 BRAM 1 DSP

3G-Conv. Encoder Rate 1:2 41 Slice

Mapper (BPSK) 30 Slice

Fig. 1. SDR Resources Requirements For Each Transmitter Chain

In case of multi-partition implementation, a partitioning algorithm, introduced by Vipin et al. [8], is used to find optimum partitioning scheme. The input to this algorithm is the number of hardware blocks of each transmitter chain known as RMs, combined with their hardware resources Fig. 1. The output of this algorithm is a set of 4 RRs with a defined RMs (wireless standards hardware blocks) allocated to it as shown in Fig. 2. 3) Synthesizing RMs after merging A number of hardware blocks (RMs) of each transmitter chain are merged together according to the output of the partitioning algorithm. Each combination, also called RM, is resynthesized using the same Xilinx Vivado tool.

1541

A. Testing Setup The Xilinx Zynq hybrid FPGA consists of Processing System (PS) and Programmable Logic (PL). The complete system can be found in Fig. 4. The PS unit is generally used for: 1) reading the bit files containing the RMs from an external memory (SD card) to the DDR to fetched during runtime for FPGA reconfiguration, 2) send control signals to different hardware blocks in the system, 3) send and capture data from and to the SDR system blocks via AXI busses, 4) calculate the reconfiguration time and 5) send debug data to PC using the UART. The PL section consists of five main parts. The first part is the set of RRs, connected together to a routing switch. This switch is used to send/capture the input/output data to the RRs and connect the RRs in different patterns according to the partitioning algorithm performed previously as in Fig.2. In case of single partition approach, we have only one RR. The second part in PL is the switch controller, which takes the control signals from the PS and tells the switch in which configuration mode it should work. Third and fourth parts are Xilinx Direct Memory Access (DMA) and an I/O interface for sending and receiving data to and from the SDR modules. The I/O interface is to manage the rate at which the communication chain can take input and can produce output, according to a control signal from the PS. The I/O interface is also responsible for resetting the RRs before sending input data to them. The fifth and final part in the PL is for sending different RMs partial bitstream data to the FPGA ICAP interface and then to the configuration memory. The FPGA reconfiguration is done internally through the ICAP interface using a partial reconfiguration controller. We used the Xilinx partial reconfiguration controller (Xil-PRC), since according to [6, 7]; it has the highest throughput among other different PR controllers.

Fig. 2. SDR Partitions and RMs allocation Determined by the Partitioning Algorithm

(a) Single Partition

(b) Multi Partitions

Fig. 3. SDR Floorplan on Xilinx Zynq FPGA

4) Floorplanning This step is done manually by the designer by arranging the set of RRs into rectangular shapes on the FPGA floorplan. The size of each RR should suffice the hardware resources of each region. The inefficient distribution of RRs on the FPGA floorplan leads to inefficient utilization of the FPGA physical resource types, and long wiring connections among RRs results in long input/output propagation delay. A design to the FPGA floorplan for the two implementations is shown in Fig. 3.

B. Hardware Utilization Table I shows a comparison between the resources and utilization of single partition DPR implementation and the 4 RRs in the multi-partition DPR implementation. It is observable that the LTE has the highest resources due to using 64-point DFT and 256-point IFFT. The fourth RR is only used by LTE, leading to a reduction in the total reconfiguration time. PS

I/O Peripherals To PC

UART SD

DDR Controller

DDR3 Memory

SD Card .bit files ARM Cortex-A9

S

S

S Partial Reconfiguration Controller

Input Stream Output Stream

PL S

M

AXI DMA

1542

HP Port

AXI Bus

I/O Interface

V. EVALUATION AND RESULTS This section gives a comparison between single partition DPR implementation and DPR implementation using partitioning algorithm (Multi-partitions implementation) for the reconfigurable hardware platform.

Master GP Port

Switch Controller

5) Place, Route and Bitstreams Generation Place and Route step is performed by the same Xilinx Vivado tool using the design netlists and the constraints of area, time and floorplanning generated from the above steps to map the design on the FPGA device. Finally, a full and partial bitstream set for the static part and the different RMs are generated.

ICAP Configuration Memory

Routing Switch

RR 1

RR 2

RR 3

RR 4

Fig. 4. Reconfigurable SDR System Overview

Util.%

Slice

BRAM

DSP

Util.%

Slice

10

0

45 52 42

200

10

20

46 49 89

1300

(a) Total Partitions Area (b) Total Reconfiguration Time (c ) Ma x . Power Consumption 1600 1608

90

6

1536

5.58

82

4.6

3

2

400

Power Consumption (mw)

600

Reconfiguration Time (ms)

Reconfigurable Frames

800

REFERENCES [1]

[2] [3]

60

50

40

[5] [6]

30

20 1

200

[7] 10

0 Single region DPR Implementation

0 DPR Implementation Using Partitioning Algorithm

0 0 99

This research was partially funded by Cairo University, ITIDA, NTRA, NSERC, Zewail City of Science and Technology, AUC, the STDF, Intel, Mentor Graphics, SRC, ASRT and MCIT.

[4] 4

40

VII. ACKNOWLEDGEMENT

70

1200

20

Hardware reusability is a novel approach in implementing SDR systems. DPR is an up-and-coming technique to implement the next-generation of reconfigurable systems. In this paper, two approaches of DPR partitioning or resources allocation schemes using single and multi-partitions are presented for a reconfigurable hardware platform for SDR systems. The proposed reconfigurable platform is designed to support a large number of wireless standards with an efficient resources utilization and reduction in handover time between wireless chains. For future contribution it is recommended to do more design optimizations in the level of RMs hardware design to increase the performance of the system, and find the similarities among the used wireless chains so some blocks can be combined and shared together, leading to a reduction in reconfiguration time. Also, it’s required to implement an efficient routing network to reduce the latency during the rerouting step after the system reconfiguration.

74

5

1000

VI.

93 1600 10 45 95.5 CONCLUSION AND OUTLOOK 20

Util.%

DSP

100

DSP

BRAM

41.5 26.5 95.8

BRAM

Slice

120

RR4 Partial Bitstream Size: 324 Kbyte Slice

Util.%

30

Util.%

DSP

3300

C. Performance Comparison The performance evaluation comparison between the two proposed implementations is done based on the total partitions area, the total time of reconfiguration and the total estimated power consumption (see Fig. 5). The total partitions area is calculated by the number of tiles reserved for all the partition regions of the design. In 7-series Xilinx FPGA, the one FPGA tile equals 36 configuration frames and one frame equals 101X32 bit [2]. The total area decreases by ~4.5% in case of multi-partition implementation, although the two implementations have the same hardware resources. This is due to the fact that during floorplanning, using a part of a tile will make the whole tile reserved even if it is not totally used. The total reconfiguration time is the summation of the reconfiguration times of all configurations or the worst case handover time between wireless standards, high-speed DPR controllers are used to achieve fast reconfiguration time and the reconfiguration time is calculated by the ARM processor on the PS side [6]. It is noticed that the total reconfiguration time decreases by ~16% in case of using the partitioning algorithm. Choosing locations of RRs in floorplanning play an effective role in reducing this time. So, the designer should be aware of the FPGA floorplan. The power consumption is estimated using the Vivado 2015.2 tool. The maximum power consumption increases with the increase of the number of reconfigurable frames. So, a 9.8% reduction in power is achieved using the multi-partition approach compared to the single partition.

0

RR3 Partial Bitstream Size: 320 Kbyte

BRAM

3G WIFI LTE

RR2 Partial Bitstream Size: 103 Kbyte

Slice

Conf.

RR1 Partial Bitstream Size: 77 Kbyte

DSP

Single Partition Partial Bitstream Size: 723 Kbyte

SINGLE PARTITION AND MULTI-PARTITIONS DPR IMPLEMENTATIONS RESOURCES COST

BRAM

TABLE I.

[8]

Fig. 5. Performance comparison between the single- and multi- partitions DPR implementation

1543

G. Sklivanitis, A. Gannon, S. N. Batalama and D. A. Pados, "Addressing next-generation wireless challenges with commercial software-defined radio platforms," in IEEE Communications Magazine, vol. 54, no. 1, pp. 59-67, January 2016 Xilinx Inc. “Partial Reconfiguration User Guide UG909" v2016.1, April 2016. S. Shreejith, B. Banarjee, K. Vipin and S. Fahmy, "Dynamic cognitive radios on the Xilinx Zynq hybrid FPGA", in International Conference onCognitive Radio Oriented Wireless Networks, 2015, pp. 427-437. Sadek, A., Mostafa, H., Nassar, A., Ismail, Y."Towards The Implementation Of Multi-Band Multi-Standard Software-Defined Radio Using Dynamic Partial Reconfiguration". International Journal of Communication Systems, 15 June 2017. Xilinx Inc. “ZC702 Evaluation Board for the Zynq-7000 XC7Z020 All Programmable SoC UG850”, September 2015. ELdin, A. K., A. Mohamed, A. Nagy, Y. Gamal, A. Shalash, Y. Ismail, and H. Mostafa, "Design Guidelines for the High-Speed Dynamic Partial Reconfiguration Based Software Defined Radio Implementations on Xilinx Zynq FPGA", International Symposium on Circuits and Systems (ISCAS 2017), Baltimore, USA, IEEE, May 2017. A. Hassan, R. Ahmed, H. Mostafa, H. A. H. Fahmy and A. Hussien, "Performance evaluation of dynamic partial reconfiguration techniques for software defined radio implementation on FPGA," 2015 IEEE International Conference on Electronics, Circuits, and Systems (ICECS), Cairo, 2015, pp. 183-186.18 K. Vipin and S. A. Fahmy, "Automated Partitioning for Partial Reconfiguration Design of Adaptive Systems," 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops, and Phd Forum, Cambridge, MA, 2013, pp. 172-181.

Suggest Documents