Implementation of a Channel Equalizer for OFDM Wireless LANs Moisès Serra Universitat de Vic C/. de la Laura, 13 VIC (Barcelona) Spain
[email protected]
Pere Martí Universitat de Vic C/. de la Laura, 13 VIC (Barcelona) Spain
[email protected]
Abstract This paper presents an implementation of a channel equalizer for a wireless OFDM according to the IEEE 802.11a and Hiperlan/2 standard. In order to implement the equalizer, algorithms of low computational complexity have been analyzed. A rapid prototype design flow is presented and applied to the prototyping of these equalizer algorithms in real time on a FPGA platform. A new point of view in the prototyping design flow and the verification process is achieved through the last generation system level design environments for DSPs into FPGAs. These environments, called visual data flows, are ideally suited for modeling DSP systems, since they allow a high level of functional abstraction with different data types and operators. The implemented channel equalizer reaches a high degree of hardware simplicity and efficiency, covering the standard specifications.
1
Introduction
The High Performance Local Radio Area Network (HIPERLAN/2) has been specified by the European Telecommunications Standards Institute (ETSI) for short radio-access on the 5 GHz band for mobile terminals [1]. The standard defines physical layer bit-rates ranging from 6 to 54 Mbit/s. The transmission format on the physical layer is a burst that consists of preamble and data fields. HIPERLAN/2 uses Orthogonal Frequency Division Multiplexing (OFDM) modulation, due to its efficient usage of the available frequency bandwidth and its robustness to channel fading. The channel equalizer is used to correct phase distortion and amplitude attenuation, caused in the signal by the radio channel and the transceiver’s nonlinear device behavior. Therefore, it is necessary to estimate the physical channel in order to obtain the inverse channel estimation coefficients of the model. According to the specifications of the HIPERLAN/2 standard, there are several types of preambles or “training
Jordi Carrabina Universitat Autònoma de Barcelona QC2060. ETSE. Campus UAB Bellaterra (Barcelona) Spain
[email protected]
data”, which are inserted into the transmitter at the beginning of each data burst, in order to achieve channel synchronization and estimation [1]. There are two main approaches to estimate channel frequency. The first set of methods uses training data transmitted on each subcarrier. The second approach uses training information transmitted on a subset of the subcarriers. In this paper, we have selected the equalizer algorithm based on the first approach, since HIPERLAN/2 is a packettype communication system. These WLAN models generally assume that the channel is constant during the length of the data packet. This greatly simplifies the channel estimation problem. It also avoids relative delays between symbols before the first channel estimates are calculated. The goal of this paper is to present the design of a channel equalizer for OFDM WLANs according to the HIPERLAN/2 standard, and an implementation methodology that efficiently maps system level descriptions down to programmable logical devices, in our case Xilinx FPGAs. Therefore, we have selected a high level model from different low-complexity algorithms. Then, we need to adapt it to the channel estimation method used and the selected device. This also implies the use of an environment that allows working with the hardware constraints oriented to the design of data flow systems. This paper is structured as follows. Section 2 presents the HIPERLAN/2 channel models. Section 3, describes the channel equalizer technique using training data. Section 4 presents the environment tools and methodology. Section 5 explains the hardware model of the equalizer. Section 6, presents the algorithm validation and FPGA performance results obtained using Matlab/ Simulink and System Generator tools. Section 7 ends with conclusions.
2
Channel Models
HIPERLAN/2 has been developed focusing on two major environments, which determine two different kinds of networks: Domestic Premises Network (DPN) that covers a
home environment, and Business Premises Network (BPN) that covers larger areas such as universities, hospitals, airports, etc.
compute the channel estimate coefficient accumulative phase angle n.
In HIPERLAN/2 five types of channel models have been approved [3], derived from measurements from typical indoor and outdoor environments. All types are used in our simulations.
3.1
The channel impulse response (CIR) is usually represented as a discrete time FIR filter [2]. Then CIR can be expressed as
h(t ,τ ) =
i =1
α i (t ) e
− j 2πf cτ i (t )
δ (τ − τ i (t )) ,
(1)
where t is time, i and i are the delay and complex amplitude of path i, fc is the central carrier frequency and is the Dirac function. WLAN applications generally assume that the channel can be considered to be time-invariant, that is, the channel does not change during the data packet. Under this condition, the equation (1) can be formulated as:
h(τ ) =
L path i =1
αie
δ (τ − τ i )
H k = DFT {h(i )}
(2)
C
C n0
n0+16
FFT window C(2,k)
FFT window C(1,k)
Figure 1. Preamble in the frequency domain Symbol C is generated directly by applying the IFFT to the following sequence:
(3)
(5)
The channel estimation process compares the received signal with the known training data in order to estimate its phase distortion and amplitude attenuation with a y/x estimation type. After FFT processing, the received training symbols C(1,k) and C(2,k) are modeled as the product of the training symbol Xk and the channel Hk plus additive noise W(n,k).
C ( n, k ) = X k H k + W( n, k ) ,
Channel Equalizer
We selected the y/x estimation type (y: received carrier, x: transmitted carrier). This low-complexity estimation algorithm allows real time implementation. Equalized data X(n,k) are represented by the following equation corresponding to subcarrier k and OFDM symbol n as
1 ˆ ⋅ Y( n , k ) ⋅ e − j (α k +σ n ) , ˆ Hk
- 1, 1, 1, - 1, 1, - 1, 1, 1, 1, 1, 0, 1, - 1, - 1, 1, 1, - 1, 1, - 1, 1, - 1, - 1, - 1, - 1, - 1, 1, 1, - 1, - 1, 1, - 1, 1, - 1, 1, 1, 1, 1}
Hence the channel estimation process outputs k, an estimate of the channel frequency for each subcarrier.
X ( n, k ) =
The training sequence consists of two OFDM symbols C, preceded by a cyclic prefix CP, which is a copy of the last 32 samples of the symbol C (Figure 1) [1].
SC - 26…26 = {1, 1, - 1, - 1, 1, 1, - 1, 1, - 1, 1, 1, 1, 1, 1, 1, - 1, − j 2πf cτ i
On the other hand, the discrete frequency response of the channel Hk is the Fourier transform of the discrete channel impulse response h(i) if the CIR length is lower than the cyclic extension length, and can be expressed as:
3
and the
Channel Estimation Using Training Data
CP
L path
k
(4)
where k is the channel estimate, Y(n,k) are incoming data to correct, αˆ k is the phase distortion estimate, and n is the accumulative phase angle. So, our implementation will
C( n , k ) Xk
= Hk +
k = 0,1,2,··, N − 1 n = 1,2
W( n, k ) Xk
(6)
(7)
It is necessary to emphasize that according to the properties of the DFT, a start frame delay of n0 samples in the time domain is equivalent to a phase angle in the frequency domain given by,
x( n − n0 ) ↔ X k e − j 2πn 0 k / N
(8)
Thus, the training symbols can be expressed as,
C (1, k ) = C '(1, k )e − j 2π ( n0 +16) k / N , k = 0,1,2,... N − 1 (9)
− j 2πn0 k / N C ( 2, k ) = C ' , k = 0,1,2,...N − 1 ( 2, k ) e
(10)
where C’(n,k) is the training symbol received after the FFT process without noise. The phase angle due to the start frame delay n0 plus the 16 samples are included in the channel estimation, which is compensated when data are equalized, thus giving more flexibility to time synchronization.
The phase tracking uses the symbol pilots, while they are inserted between the subcarriers. These pilots are generated from polynomial x7+x4+1 and are mapped by BPSK (-1, 1) in the OFDM transmitter (Figure 2)[1].
On the other hand, according to the equations (9) and (10) both symbols C(1,k) and C(2,k) have an equal mathematical structure but with an accumulative phase angle difference of π/2: a phase of 16 samples, part of a total of 64 samples (OFDM symbol), is equivalent to 90º. Then, in order to compute the average between both reference symbols, we have incrementally rotated the symbol C(1,k) by 90º, and the channel estimate is calculated as:
C(1, k ) e j 2π 16 k / N + C( 2, k ) Hˆ k = 2X k
(11)
W(1, k ) + W( 2, k ) Hˆ K = H k + 2X k
(12)
The noise samples W(1,k) and W(2,k) are statically independent, thus the variance of their sum divided by two is half of the variance of the individual noise samples. The training symbol amplitudes Xk have been selected to be equal to one, and now we can consider equation (11) as follows. j 2π 16k
Hˆ 1k =
C(1, k ) e Xk
N
j 2π 16k
=
C(1, k ) e − C(1, k ) e
C( 2, k ) C( 2, k ) Hˆ 2k = = − C( 2, k ) Xk
if if
N j 2π 16 k N
if
Xk ≥ 0
if
Xk < 0
Xk ≥ 0 Xk < 0
(13)
3.2
To obtain the accumulated phase n, we only need to know the phase of the pilot symbols equivalent to the sign of the real part. This is true when the accumulative phase correction is lower than ± π/2. This tracking can be done using four pilots or even less, depending on how many times we want to correct using the symbol OFDM.
4
Environment tools and Methodology
4.1
Environment Tools
Visual data flow environments are ideally suited for modeling DSP systems, as many algorithms are specified most naturally by signal flow graphs. Data flow tools are similar to traditional schematic capture tools because they provide libraries of functional blocks that can be composed graphically to model a system. However, in contrast to schematic tools, the library blocks and the simulation environment in a data flow tool provide a high level of functional abstraction, with polymorphic data types and operators to model arithmetic in integer, fixed-point and floating point data [6].
(14)
We have selected an XtremeDSP Development Kit that provides a complete platform for high-performance signal processing applications. The kit includes a visual data flow environment; the System Generator and the Nallatech prototyping platform (BenONE + BenADDA).
(15)
The System Generator for DSP uses Simulink to represent a high-level, abstract view of the DSP system; it automatically maps the system to a predictable, highly efficient hardware implementation that allows you to customize the architecture to suit your DSP algorithm. Furthermore, it facilitates the construction of designs that are driven by multiple, asynchronous clock sources. These multi-clock designs can employ a combination of clocks and the derived clock can be enabled, allowing advanced clocking strategies to be defined completely inside the tool.
Hence, equation (11) can be expressed as:
Hˆ 1 k + Hˆ 2 k Hˆ K = 2
Figure 2. Sub-carrier map for data and pilots
Phase Tracking
The phase tracking allows the data equalizer to correct the accumulative phase angle of each OFDM symbol, due to the frequency offset estimation error in the synchronization stage.
Simulink provides a powerful high level-modeling environment for DSP systems and consequently, it is used widely for algorithm development and verification. The Nallatech prototyping platform including the BenADDA DIME-II module provides high-speed digital-toanalogue and analogue-to-digital conversion capability. The module contains two high-speed ADC and two high-speed DAC channels, which allow for flexible, high-resolution data conversion for both base-band and IF applications. The key to the BenADDA’s performance is in the on-board Xilinx Virtex-II FPGA, which provides a powerful data processing resource. Some of the main application areas for the BenADDA include mobile communication systems, infrared imaging, wideband cable systems and multichannel, multi-mode receivers.
4.2
Methodology
The rapid prototype design flow used is shown in Figure 3. It begins with double precision exploration and a validation algorithm. The System Generator provides the tree arithmetic data types: double precision floating point, signed and unsigned fixed point numbers. This allows the floating and fixed-point algorithms to be explored. Then, an algorithm model is obtained and can be simulated. Once this model is validated, it is translated into efficient hardware (VHDL) and then synthesized and placed & routed into FPGA by means of an automatic process. After this step, a co-verification model is obtained, which allows platform exploration with an iteration refinement process. Double precision exploration and validation Matlab/Simulink/System Generator Refine Fixed-point modeling and simulation Simulink/System Generator
Data equalizer Inverse channel coefficients
k
Channel estimator Pilot symbol
1/
Phase Offset
Phase tracking
Figure 4. Channel equalizer scheme
Now, it is necessary to consider that it is very important to look for the best implementation of each block. It is also necessary to fix system latency, since the system has to start working when the first OFDM symbol arrives to the channel equalizer. This will lead to the number of iterations allowed for the algorithms according to their operating clock frequency.
5.1
Channel Estimator
The hardware block diagram proposed for implementation of the channel estimator is shown in Figure 5. The rotator block incrementally rotates each subcarrier of symbol C(1,k) by 90º. This rotation is counter-clockwise and, therefore, it is necessary to rotate according to a repetitive cyclic sequence of 0º, 90º, 180º and 360º in all 64 subcarriers. The proposed method is very simple, and its low-cost implementation is achieved by interchanging real and imaginary parts and their signs in the suitable form. 90º Rotator
C(1,k) 1k
Z-64 Buffer
-1
Enable
-1
Refine Enable
Address generator
ROM Training Symbol C
Write controller
1(k-m)
Mean
C(2,k)
Exploration Model in platform XtremeDSP platform
Xk
k
Enable
Co-Verification Model Simulink/System Generator Refine
Yk
C(n,k)
Model validation Simulink/System Generator VHDL Generated, Synthesis, Place & Route System Generator/ISE
implementation has been divided into four blocks. These blocks are: estimator, inverse channel coefficients, data equalizer and phase tracking.
RAM
k
k
Channel estimate
2k
Xk
Address
Figure 5. Channel estimator block Figure 3. Rapid Prototype Design Flow
5
Implementation
The hardware block diagram proposed for implementation of the channel equalizer is shown in Figure 3, where the
Equations (13) and (14) are implemented using a ROM memory that contains the sign changes of training symbol Xk (0 keeping sign, 1 changing sign). These values control the two multiplexers that change the signs of the training symbols C(1,K) and C(2,K). The mean block implements the equation (15). Firstly 1k delays the 64 samples using a buffer to synchronize in time the two cannel estimates 1k
and 2k. Next, 1k and 2k are added and divided by 2 (simple shift operation). The obtained result is the channel estimate k that is kept in a RAM memory.
5.2
Inverse Channel Coefficients
The channel estimate k has a complex form a+ bj, so, it is necessary to know its module and phase to make the inverse operation easy. Thus, the inverse channel coefficients can be expressed as:
1 1 1 e − jα = = = Hˆ k ak + bk j Hˆ k ⋅ e jαˆ Hˆ k
ˆ
| k|
Y = 21 y 0 + 2 0 y1 + 2 −1 y 2 +
Inverter
1/| k|
CORDIC algorithm
5.2.2
The HUNG algorithm
This algorithm is a high radix division algorithm based on the Taylor series expansion. It combines the first two terms of the Taylor series, and requires only a small lookup table to generate accurate results [5]. In this case, the module | (k)| that enters the division is positive and ranges from 0 to 3. So, at least two integer bits are required to represent it.
Yh = 21 y 0 + 2 0 y1 + 2 −1 y 2 +
+ 2 −( m − 2) y m −1
(18)
Yl = 2 − ( m −1) y m + 2 − m y m +1 +
+ 2 −( 2m − 2) y 2m −1
(19)
Yh − Yl 1 1 = = Y Yh + Yl Y 2 h − Y 2 l
(20)
The approximation in equation (21) is equivalent to combining the first two terms of the Taylor series
1 Y h − Yl ≈ Y Y 2h
(21)
Figure 7 shows the block diagram implementation of the algorithm according to equation (21).
This classical algorithm is derived from the general (Givens) rotation transform [4] and it allows trigonometric functions to be performed. For our implementation, it is necessary to compute the inverse channel coefficients in real time. This means all computations must be performed in one reference clock cycle. Hence, the ideal architecture for this block is the pipeline CORDIC configuration. The CORDIC algorithm requires one shift-Addsub operation per each bit of output width. Our implemented CORDIC with the pipelined architecture configuration, implements these operations in parallel with the use of an array of shift-addsub modules. So, it is very important for the final cost to make a suitable choice for the bit precision and the number of stages.
(17)
Then, we get the following equation.
Figure 6. Inverse channel coefficient block
5.2.1
+ 2 − ( 2m − 2) y 2 m −1
To calculate 1/Y, Y is decomposed into two parts: the higher order bits Yh and the lower order bits Yl, where Yh contains the m most significant bits (18) and Yl contains the m least significant bits (19).
∠-α
-1
Module And Phase
Let Y be one 2m-bit fixed point number between zero and four defined by equation (17) where yi {0,1}.
(16)
The CORDIC algorithm (COordinate Rotation DIgital Computer) is used to obtain the module and the phase and the Hung method for the inverse. Figure 6 shows the implemented hardware block.
k
Next, we will describe the basic steps of Hung’s method that we used for this application.
Yl
Operand Y (2m bits )
Yh
Lookup
Yh-Yl
Table 2
1/Yh Multiplier
Result 1/Y
Figure 7. Block diagram of the algorithm
In the first step, the algorithm retrieves the values 1/Yh2 from a lookup table and generates Yh-Yl at the same time. In the second step, 1/Yh2 and Yh-Yl are multiplied to obtain the result. The latency of the proposed algorithm is 1 cycle (1 MULT) according to the hardware module selection.
5.3
Phase Tracking
The hardware block proposed for the implementation of the phase tracking is shown in Figure 8. In our application, the phase angle to correct is very small, so we have decided to make a single correction per OFDM symbol, taking as a reference the last OFDM symbol pilot Y(n,P1).
Y(n,k)
Registred n,P1
Sign n,P1real
Sign
Rotate 180º
Sign=1
n,P1 Sign=0
Enable
Registred ∠ Arctg(n,P1) CORDIC
Accumulator -2pi to 2pi
∠-
since it can include parts of the proprietary designs, Xilinx library blocks and third party cores that adapt to the family device and the available resources of the selected FPGA. Accumulate
-1
Enable Delay CORDIC
Y(n,k) = Pilot(n,P1)
Figure 8. Phase tracking block
Once the comparator detects the symbol pilot, this one is registered according to the sign of its real part. The phase of the registered symbol pilot is obtained with a CORDIC unit. In this case, the delay for CORDIC processing is not critical (4µs per OFDM symbol), so a CORDIC iterative structure can be used. The obtained phase is accumulated with an opposite sign and truncated to the interval ±2π.
5.4
Data Equalizer
The Data equalizer corrects the distortion phase and attenuation signal received in function of the inverse channel coefficients. Figure 9 shows the hardware block diagram implemented according to equation (4). Y(n,k)
Rotator
Multiplier
X(n,k)
Limit -2pi to 2pi From offset tracking block From inverse channel coefficients block
∠-
Accumulate
∠-αk
|1/ k|
Figure 9. Data equalizer block
Figure 10. Transceiver HIPERLAN/2 model
6.1.1
Algorithm validation
In order to obtain the results of the channel equalizer, the whole transceiver HIPERLAN/2 model includes the channel equalizer (Figure 10). This model has the following characteristics: 12 Mbit data rate, QPSK modulation, ½ coder rate, 48 subcarriers, 4 symbol pilots, 16 samples of prefix cyclic, 4µs of OFDM symbol period and a target clock of 60MHz. The clock for the channel equalizer model is 16 MHz so as to maintain a continuous data flow, and the timing constraint has been fixed to 50ns. It is very important to select the correct bit precision of the equalizer block, in order to get an optimal balance between performance and cost. For both Table 1 and Figure 11, Uvic refers to our design with the System Generator (using fixed point arithmetic) and the channel estimator Simulink refers to the same model validated with Simulink (using floating point arithmetic). Bit
The phases of both channel estimation coefficients and phase tracking are added and limited to the range ±2π (truncating them if required). The CORDIC unit rotates the received data by this resulting angle. In this case, the CORDIC unit also uses the pipelined architecture, because it is necessary to produce one output symbol per cycle. The corrected input is multiplied by the inverted channel estimate to obtain the equalized data.
6
Results
The development environment used is composed of the Matlab/Simulink and System Generator tools that make it possible to work at a system level, in terms of injecting stimuli, analyzing results and changing both algorithm and implementation level parameters. This environment permits the development of designs optimized for Xilinx FPGA,
Y(n,K)/ (k)
precision
(k) error max. (%)
1/ (k) error max.(%)
error max.(%)
10
0.96
13.60
15.98
12
0.38
6.25
6.04
14
0.11
2.44
2.45
16
0.01
2.03
2.02
Table 1. Relative Error for Uvic vs. Simulink equalizers with respect to bit precision.
Hence, relative error resulting between both implementations, are shown in Table 1, help to select the best alternative. According to the results in Table 1, we selected a 14 bit precision as the optimal balance between the relative error and the FPGA resource usage. The performance obtained by our channel estimation block is satisfactory, as shown in Figure 11 and Table 1,
with a maximum relative error of 0.11% compared to the Simulink channel estimation. Module Channel Response Uvic
0
-10
-10
-20
-20
-30
-30 -5
0
5
10
Phase Channel Response Uvic
-40 -10
2
0
0
-2
-2 -5 0 5 Frequency (MHz)
10
10
-10
-5
0
5
SLICES 154 234 155 278
FFs 153 408 154 492
LUTs 222 430 229 516
FPGA Virtex 300 and clock=16MHz Number of Slices 1636 53% 12 bits Number of BRAMs 1 6% Number of Slices 1911 62% 14 bits Number of BRAMs 1 6%
10
Table 3. FPGA resources for the equalizer -5 0 5 Frequency (MHz)
10
Figure 11. Module and phase channel response for Uvic and Simulink estimators.
The phase tracking block has been simulated during 2ms using the HIPERLAN transceiver. Figure 12(a) shows the distribution of the constellation with the phase tracking block. It can be seen that the constellation stands at a static position. On the other hand, Figure 12(b) shows the distribution of the constellation without that block. In this case, the constellation rotates.
(a)
CONFIG SERIAL PARALLEL SERIAL PARALLEL
Table 2. FPGA resources for the CORDIC block
Phase Channel Response Simulink
2
-10
8
Module Channel Response Simulink
0
-40 -10
STAGES
(b)
Figure 12. Constellation Distribution with (a) and without (b) phase tracking
7
Conclusion
In this paper we have presented an approach for the system design and validation of a HIPERLAN/2 channel equalizer unit, and the corresponding hardware implementation. The obtained channel equalizer optimally combines simplicity and efficiency, covering HIPERLAN/2 standard specifications. Several methods have been applied to reduce implementation complexity such as: the Hung method to calculate the 1/X, the use of one symbol pilot for phase tracking, and the combination of pipelined and iterative CORDIC architectures. In comparison with a similar equalizer implementation presented in [2], the resources obtained are 41% better. The Matlab/Simulink and System Generator environments accelerated the design by starting at the system level. Blocks with different implementation refinement can be mixed for model verification and parameter validation, using equal stimuli and scopes. This produces faster and more efficient implementations of complex systems.
References 6.2
Hardware Implementation
FPGA resource usage, for both iterative and pipelined CORDIC architectures are shown in Table 2 as a function of the number of stages. The accuracy that is needed for the application fixes the number of CORDIC stages. In our channel equalizer, we decided to use 8 stages (10 cycles of latency). In this case, serial architecture uses roughly half of the FPGA resources used by the pipelined architecture. On the other hand, Table 3 shows the FPGA resources used to prototype the channel equalizer into a Nallatech prototyping platform corresponding to 12 bit and 14 bit precisions.
[1] HIPERLAN/2,ETSI-TS-101-475-v1.3.1, 2001. [2] J. Heiskala, J. Terry, OFDM Wireless LANs: a theoretical and practical guide, Sams Publishing, 2002. [3] J.Medbo, P.Schramm,Channel Models for HIPERLAN/2 in Different Indoor Scenarios, ETSI EP BRAN, March 1998. [4] R. Andraka, A survey of CORDIC algorithms for FPGA based computers, 6th Intl. Symp. on FPGA, 1998. [5] P.Hung, H.Fahmy, O.Mencer and M.J.Flynn, Fast Division Algorithm with Small Lookup table, 33rd Asilomar Conf. on Signals, Systems and Computers, Vol.2, May pp 1465-1468, 1999. [6] J. Hwang, B. Milne, N. Shirazi, J. Stroomer, “System Level Tools for DSP in FPGAs”, Xilinc Inc, USA.