Implementation of an FFT on the FPGA of USRP2 ...

71 downloads 0 Views 769KB Size Report
with the modern commercial off-the-shelf PCs. • User-space interfacing through the Universal Hardware Drivers (UHD), in both Windows and Linux OS. • Support ...
Implementation of an FFT on the FPGA of USRP2 boards J.L. Buthler, M.Buhl, G. Berardinel,A.F. Cattoni (presenter) Aalborg University, Denmark

Outline • • • • • •

Introduction Motivation Challenges Solution Results Conclusion

Introduction • •

Software Defined Radio (SDR) testbeds can greatly benefit the research environment due to the flexibility and reconfigurability Concerning SDR development, widespread acceptance has been achieved by the Universal Software Radio Peripheral (USRP) by Ettus Research • • • • •

Cost-effectiveness, that makes it suited for the realization of large scale testbeds Support of a large set of inter-changeable Radio-Frequency (RF) front-ends Connectivity with the host PC through the Gigabit Ethernet port, ensuring compatibility with the modern commercial off-the-shelf PCs User-space interfacing through the Universal Hardware Drivers (UHD), in both Windows and Linux OS Support of the open source GNU Radio software radio development framework

Motivation •

The current FPGA firmware only contains basic operations as: • Communication between Host-pc and USRP • Filtering • Analog/Digital conversion • Up/down sampling of the RF signal



Implementation of further digital signal processing on the remaining resources can boost data rate performance Two main reasons for focusing on the FFT: • Prevents further communication between Host-pc and USRP, which would decrease data rate due to stress on the Ethernet connection • Found to be one of the computational heavy tasks of a modern communication chains, such as LTE, and is theoretically able to gain up to N/2 times the speed by parallel processing (N being the number of bins in the FFT)



Challenges • Implementation should be compatible with the existing firmware • New versions of the USRP firmware uses most of the external Ram for the timestamp feature (97% of the USRP2 and 50% of the USRPN200) • Current dataflow is serialized but the FFT needs to operate on a packet of data • The 100 MHz clock sets high requirements for the amount of parallelization/pipelining • Parallelization requires a lot of multipliers (at least 4 for each 2 input samples ) • Pipelining requires memory and multipliers • Xilinx’s core generators own FFT uses many resources and does not intuitively adapt to the existing dataflow of the FPGA firmware • The FFT has to be designed from scratch with resource efficiency in mind

Solution

• •

General data processing module was designed to adapt the dataflow of the current firmware Data is intercepted between the existing data processing block (CIC filter) and the VITA module (handles timestamping and adding other metadata to the samples)

Solution •





The module is transparent to the existing firmware since it adapts the dataflow using the strobe signal Uses a pipeline to collect samples for processing Grey box is a setting register reused from the original firmware, the plan is to utilize it to gain information about the desired FFT length and then make the module decide the CP length itself

Solution • • • • • • •

Uses Cooley Tukey FFT algorithm Resource optimized 16 bit fixed point operations Able to support different lengths without new firmware image has to be loaded Supports FFT length of up to 1024 using the 100 MHz clock Uses Dual Access RAM to load the I and Q sample simuntaniously Decimation of 1/N to avoid overflow

Results – Xilinx coregen vs. Proposed design Specification

Xilinx Core Gen.

Proposed design

Required

FFT size

1024

1024

1024

Input data width

16

16

16

Twiddle factor width

16

16

16

CP insertion

No

No

Yes

Speed [Msps]

~30

~16

14.336

Latency [micro s]

34

64

71

Scaled

Yes

Yes

Yes

Data format

Fixed point

Fixed point

Fixed point

Optimized for

Resource usage

Resource usage

Resource usage

Resource

Xilinx Core Gen.

Proposed design

Remainding on the USRP2

Slice

1909

1252

11878

Slice Flip Flops

2816

279

23756

4 input LUTs

2796

2331

23756

BRAMs

7

4

2

Results – High increase if 2048 point should be realized Specification

Xilinx Core Gen.

FFT size

2048

Input data width

16

Twiddle factor width

16

CP insertion

No

Speed [Msps]

~30

Latency [micro s]

34

Scaled

Yes

Data format

Fixed point

Optimized for

Resource usage

Resource

Xilinx Core Gen.

Remainding on the USRP2

Slice

4307

11878

Slice Flip Flops

6329

23756

4 input LUTs

5905

23756

BRAMs

11

2

Conclusions •

It is tractable to implement new data processing features on the remainding fabric of the USRP2, however the memory usage of the ”timestamp” feature is quite high



A resource optimized FFT has been designed which supports an FFT length of up to 1024, given the current FPGA clock frequenzy



Xilinx coregen is faster than the proposed design, however this speed is unnecessary and comes at the cost of even further RAM usage



The USRPN200 provides more avaliable RAM and should be the target for implementation

Suggest Documents