FPGA Implementation of a Novel Receiver Diversity

2007 IEEE International Conference on Signal Processing and Communications (ICSPC 2007), 24-27 November 2007, Dubai, United Arab Emirates

FPGA IMPLEMENTATION OF A NOVEL RECEIVER DIVERSITY COMBINING TECHNIQUE FOR WIRELESS SIMO SYSTEMS R. Ayoubi University of Balamand Koura, North, Lebanon

J. P. Dubois University of Balamand Koura, North, Lebanon

engineering costs, and ability to re-program in the field to fix bugs makes FPGA an attractive technology. The FPGA architectures [1] are dominated by interconnect which makes them very flexible in terms of the range of designs that are practical for implementation within them. These architectures contain embedded memories and higher-level embedded functions such as adders and multipliers. Many modern FPGAs also support full or partial in-system reconfiguration, allowing the ability to be reprogrammed at "run time". FPGA architectures have been modernized by combining logic blocks and interconnects with embedded microprocessors to form a complete "system on a programmable chip" (e.g. Xilinx Virtex-II PRO and Virtex-4 devices, and the Atmel FPSLIC [2]). FPGAs offer architecture that are well suited for algorithms that require massive parallelism. The inherent parallelism of the logic resources as well as the availability of complex hard cores (such as multipliers, large RAM blocks, DSP slices, etc.) on the FPGA allow for considerable computation throughput even at sub500MHz clock rates. In fact, today’s generation of FPGAs can implement around 100 single precision floating point units, all of which can compute a result every single clock cycle. Applications of FPGAs include DSP, image processing, computer vision, speech recognition, cryptography, telecommunication, computer hardware emulation and a growing range of other areas. High performance computing DSP applications with FFT or convolution computational kernels are increasingly being performed on the FPGA instead of a microprocessor [3]. Due to the abundance of logic fabrics and state elements in the FPGA, one of the main techniques to improve performance is pipelining. In this paper, an efficient FPGA implementation of a novel diversity combining technique [4 , 5] in wireless single-input multiple-output (SIMO) systems is proposed, where pipelining techniques play an important role in achieving a very high throughput. The rest of this paper is organized as follows. In section 2, we describe our previously published novel RMSGC diversity combining technique whose performance was investigated for BPSK transmission over Rayleigh fading channels in single-input multipleoutput (SIMO) systems (a sub-component of MIMO). In section 3, we propose the FPGA implementation of

ABSTRACT In this paper, we study FPGA implementation of a novel receiver diversity combining technique, RMSGC for wireless transmission over fading channels in SIMO systems. Prior published results using ML-detected RMSGC diversity signal driven by BPSK showed superior bit error rate performance to classical diversity combining schemes. RMSGC was shown to be nearoptimal in the sense that it was very close to that of the theoretically optimal MRC. Since MRC requires estimation of the channel coefficients, it is complicated and expensive, and it is not practical for non-coherent modulation and differentially coherent modulation. RMSGC, on the other hand, is a more attractive and simpler scheme since it does not require knowledge of the channel information states. The main drawback of RMSGC is that it is a non-linear technique, thus successful FPGA implementation of it using pipeline techniques is needed as a wireless communication testbed for practical real-life situations. Simulation results showed that the hardware implementation was efficient both in terms of speed and area. Index Terms— field-programmable gate array; maximal-ratio combining; multiple-input multiple-output channels; pipelining technique; root-mean-square gain combining.

1. INTRODUCTION Since its introduction in the late eighties, FPGA (fieldprogrammable gate array) technology has been steadily improving in performance, density, power consumption, and overall system cost. This is due to two main factors. First, advancement in VLSI technology allows few million logic gates to be fabricated on a single die. FPGAs include a relatively large number of programmable logic elements, typically ranging from tens of thousands to several million programmable logic elements. Second is the availability of powerful software tools that can simplify conception, simulation, and implementation of complex designs. Other advantages such as a shorter time to market, non-recurring

1-4244-1236-6/07/$25.00 © 2007 IEEE

O. Abdul-Latif University of Bath Claverton Down, Bath, U.K.

37

received from different channels and apply gain combining schemes to enhance the signal-to-fading-noise ratio and improve the system’s performance. Although the wireless system we studied is a (1 x L) SIMO, the results can be extended to (M x L) MIMO systems using simple spatial cycling techniques. Under such techniques, MIMO systems are implemented by using only one transmitter at a time and by cycling over the M transmitters periodically, effectively employing a SIMO structure at every transmission period.

RMSGC using efficient pipelining techniques. Future extension of the design includes implementing a detector and interfacing a wireless BPSK transceiver to the FPGA. 2. NOVEL RECEIVER DIVERSITY COMBINING Antenna diversity is a promising technique for overcoming multipath fading in a wireless channel. Diversity techniques are generally used to generate multiple signal branches between transmitter and receiver and have been shown to improve mean signal strength [6] and reduce signal level fluctuations in the fading channel. There are four basic pure diversity combining schemes, depending on the complexity restrictions placed on the communication system and the amount of channel state information available at the receiver: Maximum ratio combining (MRC), equal gain combining (EGC), maximum gain combining (MGC) or selective combining (SC), and post-detection square-law combining (SLC). The term “pure” is used as a distinction from recently proposed “hybrid” switched techniques [7]. Diversity combining use the multiple signal branches in the wireless channel advantageously by improving the antenna diversity and system performance, resulting in lower bit error rate (BER) and higher channel capacity. MRC provides optimal performance but suffers from high implementation complexity and is not practical in systems using differentially coherent (e.g. DPSK, DQPSK) and non coherent modulation (e.g. BFSK) since it requires estimation of fading channel coefficients. On the other hand, MRC can be used in conjunction with unequal energy systems such as M-QAM where estimation of the diversity paths amplitudes is needed for automatic gain control purposes. EGC is widespread because of its practicality. In fact, it can be simply implemented by equally weighing each branch before combining (summing) them. Although suboptimal, ECG with coherent detection is often an attractive solution since it does not require estimation of the fading amplitudes and hence, it is often limited in practice to coherent modulations with equal energy symbols (e.g. MPSK). Other alternative combining techniques such as MGC (or SC) and post-detection square-law combining (SLC) are also used because of their reduced complexity relative to the optimum MRC scheme. In MGC, the receiver simply selects the signal path with the highest gain. Although conceptually simple, MGC is not as popular as EGC since it requires continual monitoring of the diversity channels and its performance is very suboptimal. SLC was initially introduced by Proakis [8] and later analysed by a number of researchers to non-coherent and differentially modulated signals. The main distinctive feature of SLC is that it is a post-detection combining technique and requires simultaneous detection of the received diversity signals. SLC is thus not suitable for more complicated detection schemes such as SVM [4]. For SIMO systems (sub-components of MIMO) with L antennas at the receiver, diversity receivers extract multiple signal branches or copies of the same signal

2.1 Root-Mean-Square Gain Combining In this section, we introduce our receiver diversity gain combining scheme, termed root-mean-square gain combining (RMSGC) and illustrated in Fig. 1. In the RMSGC technique, diversity signal paths arrive at the L receiver antennas, where each signal is squared using a square law device. Depending on the polarity of the original arriving signal, the squared signal is inverted (if originally negative), which is achieved by a signum filter. Then all the signals are summed and the composite diversity signal is processed using a square-root (of the absolute value) device before being sent to a detector. If the polarity of the composite signal is negative, the square-rooted signal is inverted using a signum gain. The envelope β of the RMSGC signal takes the form

β ≈

,

¦γ

I R

ρ >> or , ,

(1)

I =

where γ IR is the received diffuse envelope of the i-th diversity signal path, L is the no. of antennas, ρ is the average SNR per bit. The phases are assumed to be known and corrected.

⋅

Fig. 1 Novel RMSGC diversity combining scheme.

The fading power statistic ȕ2 is consistent with that of a Gamma law with shaping parameter L and scaling parameter Pdif (or chi-square with 2L degrees of freedom) and the envelope is Nakagami-m distributed. 2.2 Signal-to-Scattering Noise Ratio We define the signal-to-scattering-noise ratio

SSNR E ( β 2 ) / σ β 2

38

(2)

the summation. Finally, an eight-stage square root unit is used. After an initial latency of twelve cycles, every clock cycle an output is generated. This leads to a very high throughput, processing eight parallel channels at the speed of the clock. The main units in our design are square, adder, and square root units. The square unit need not be pipelined. Since a square unit can be implemented using a multiplier, and since a large number of multipliers is availably hardwired inside the FPGA chip, these units can run at very high speed without pipelining. Moreover, the squaring of all eight channels can be done simultaneously. Fig. 2 shows the square unit stage. Notice that after squaring the input, either the result of its negative is stored in the pipeline register. Although not shown in the figure, this stage is replicated eight times.

which corresponds to the reciprocal of speckle contrast in active radar imagery [9]. Since the SNR contains the random multiplicative fading power term β 2 and the constant AWGN power spectral density term N0, an equivalent metric to the SSNR’s reciprocal is the signalto-noise ratio peakedness SNRP = σ ρ / E ( ρ ) , where ρ

is the random SNR. This metric measures the “amount” or “degree” of randomness of the SNR and is more significant than the average SNR, especially in shadow fading environment where there will be significant fluctuations of the SNR around an underlying mean value. Similarly, a large SSNR metric signifies that the signal’s power level fluctuations are small relative to the mean signal’s power strength, indicating “reliable” communication. According to Eq. (2), this metric is invariant to a constant gain. This is consistent with the fact that direct constant gain amplification at the receiver’s antennas is ineffective in a fading environment since a linear amplifier of constant gain C will amplify the faded power at every received symbol by C while also amplifying the variance of the power by C2. The mean relative to the standard deviation remains unchanged (unity), thus rendering the amplifiers statistically transparent to the system. We also define the diversity combining statistical gain G = SSNRSIMO / SSNRSISO = SNRPSISO / SNRPSIMO

ADC Input

Square Unit

Two’s Complement Unit

MUX

Register

Fig. 2 First pipeline stage performs squaring. Either the result of the square or its negative is written to the pipeline register depending on whether the ADC input is positive or negative. Not shown in the figure, this stage is replicated 8 times.

Similarly, many adder units can run in parallel. However, it would not be practical to sum all eight elements simultaneously. This could lead to a prohibitively large adder and/or slow unit. Thus, the sum is done in a tree fashion and pipelined as shown in Fig. 3. The summation consists of three pipeline stages. The first stage performs four parallel additions on the inputs from the square stage. Next, two parallel additions take place. The final addition is written to the pipeline register in absolute value along with a flag (not shown in the figure) in order to pass information to the next stage about the sign of the results. In general, such tree-like implementation takes log 2 ( L) to sum L numbers (L channels). The bottleneck of the FPGA implementation is the square root unit. After analyzing the timing reports generated by Altera’s Quartus II software, we found, as expected, that the timing limitation was due to the square root unit. This limited the clock speed to around 30 MHz. After experimenting with different pipelining stages for the square root unit, we found that eight-stage pipeline is the optimal number of stages, thus improving the clock speed to more than 115 MHz as shown in Table I. Fig. 4 shows the implementation of the square root unit. The final result could be either the square root or its negative depending on information propagated from the last addition stage.

(3)

For a single-input-single-output (SISO) system, the SSNR is unity (or asymptotically one) under various fading conditions (except for Rician and shadowed multipath fading). It can be readily shown that in the case of RMSGC, the SSNR is enhanced by L (e.g., the RMSGC statistical gain of 4 antennas is 100%). 3. FPGA IMPLEMENTATION

In the following proposed design, reference is made to the functional block diagram of the SIMO-RMSGC processor of Fig. 1. After the data is received, analog-todigital conversions should take place on all channels (in our case, we are using eight channels for our implementation example). The data is then digitally processed using FPGA technology. The target FPGA family was an Altera Cyclone EP1C6F256 due to its availability in our labs. In order not to clutter the figure, the FPGA implementation of the receiver is shown in three separate figures in a sequential order starting from receiving the ADC signals. After this, squaring of the inputs takes place. Next, the outputs of the square units are fed into a tree of adders. Lastly, square root is applied to the final summation. Even though an example of eight channels is illustrated, the proposed implementation can be easily generalized to any number of channels. In order to keep up with the throughput of the received signals, pipelining technique was used. The overall design consists of twelve pipeline stages. The first stage is the squaring unit. Next, we have three stages for

TABLE I IMPLEMENTATION RESULTS Total Logic Elements 2405 out of 5980 (40%) Max. clock frequency 116.23 MHz (period = 8.604 ns )

39

Fig. 3 The summation consists of three pipleline stages in a tree fashion. The final addition is written to the pipeline register in absolute value.

Fig. 4 Square root or its negative is written in the final pipeline register.

Fig. 5 Simulation results of the FPGA implementation of the RMSGC-based SIMO diversity

speed was possible due to sub-dividing the complex operations into smaller computation stages and pipelining all the stages. Without a pipeline scheme, standard techniques would lead to each sample going through several computation stages before the final output is produced, whereas with pipelining techniques there will be overlapping in computation leading to a single sample per clock (10 ns per sample). This limit is more than sufficient for practical signal processing with sampling rates set at the fundamental Nyquist rate. The performance of the FPGA-embedded RMSGC was tested in terms of the SSNR metric which showed a diversity gain approximately equal to 8 , as expected.

The simulated timing diagram is shown in Fig. 5. It is clear that after an initial 12 cycles latency, a result is generated every clock cycle. Table I summarizes the mapping results in terms of logical resources and maximal clock period on an Altera Cyclone FPGA. For our eight-channel receiver, the utilization of the FPGA chip did not exceed 40%, which means that we can easily upgrade our design into cases including larger number of channels on the same Cyclone used. Moreover, the clock frequency achieved was more than 115 MHz. Using randomly generated digital data, we observed the output data of the FPGA-implemented RMSGC and verified that the diversity combining gain of Eq. (3) was approximately equal to 8 , as expected theoretically [5].

5. REFERENCES 4. CONCLUSIONS AND FUTURE WORK [1] S. Brown and J. Rose, "Architecture of FPGAs and CPLDs: Tutorial," IEEE Design & Test of Comp., vol. 13(2), pp. 42-57, 96. [2] http://www.fpga-faq.org/FPGA_Boards.shtml [3] L. Bylanger, “Combining DSPs and FPGAs in NextGeneration Multimode Wireless Handset Designs,” DSPFPGA.com Magazine, May, 2007. [4] O. Abdul-Latif and J. Dubois, “LS-SVM Detector for RMSGC Diversity in SIMO Channels,” ISSPA’07, Dubai, 2007. [5] J. Dubois and O. Abdul-Latif, “Improved Receiver Diversity Processing over SIMO Fading Channels,”ICSP’07,Dubai,submitted. [6] J.Anderson, “Antenna Arrays in Mobile Communication,” IEEE Trans. Antennas & Propagation, vol. 42, pp.12-16, 2000. [7] M. Simon, M. Alouini, Digital Communication over Fading Channels, 2nd edition, John Wiley & Sons, 2005. [8] J. Proakis, Digital Communications, McGraw-Hills, 2000. [9] J. Daba and M. Bell, “Statistics of the Scattering Cross Section of a Small Number of Random Scatterers,” IEEE Trans. on Antennas and Propagation, August 1995.

In this paper, we proposed an FPGA implementation of a novel RMSGC receiver diversity combining scheme for wireless SIMO channels. RMSGC was previously proven to be superior to pure combining techniques such as EGC, MGC, SLC, and near-optimal in the sense that its performance was very close to the theoretically optimal MRC. The latter is not practical since it requires perfect knowledge of channel state information and cannot be implemented in non coherent and differential modulation. The only disadvantage of RMSGC, that of being a non linear technique, was overcome in this work by verifying that the proposed hardware implementation was efficient both in terms of speed and area. Moreover, we were able to achieve a clock frequency above 115 MHz. This high

40

FPGA Implementation of a Novel Receiver Diversity

FPGA Implementation of a Novel Receiver Diversity

Suggest Documents

ofdm transmitter and receiver implementation on fpga

An FPGA implementation of WCDMA-RAKE receiver ...

An FPGA implementation of WCDMA-RAKE receiver ...

fpga implementation of tunable fft for sdr receiver - International ...

design and implementation of ofdm transmitter and receiver on fpga ...

Design, Simulation and FPGA Implementation of a Novel ... - IAENG

A Novel FPGA Implementation of Adaptive Color Image Enhancement

A novel polarization diversity receiver for pmd mitigation ... - UMBC

A Novel Receiver Diversity Combining Technique for Internet ... - wseas

Implementation of a spread spectrum receiver with

Novel FPGA-Based Implementation of Median and Weighted Median ...

Novel FPGA-Based Implementation of Median and Weighted Median

FPGA IMPLEMENTATION OF A VEDIC CONVOLUTION ALGORITHM

FPGA Implementation of a Frame Synchronization ... - Radioengineering

Novel Architecture for Efficient FPGA Implementation of ... - IEEE Xplore

FPGA Implementation of a Lattice Quantum Chromodynamics

FPGA Implementation of Image Enhancement

FPGA Implementation of Generalized Hebbian

FPGA Implementation of Polynomial Evaluation

algorithms and FPGA implementation

High Performance FPGA Implementation of

digital communication receiver based on fpga design

A Fast FPGA Implementation of a General Purpose ... - Semantic Scholarhttps://www.researchgate.net/.../A-Fast-FPGA-Implementation-of-a-General-Purpose-...

A HIGH THROUGHPUT FPGA CAMELLIA IMPLEMENTATION