demapper on Xilinx field programmable gate array (FPGA) are addressed. The DCM ... To transmit a Packet Service Data Unit (PSDU) with preceding training .... management features: clock desk skew, frequency synthesis, phase shifting, etc. ... State Machine (FSM) and a mapping block, as depicted in Fig. 3. Some blocks ...
FPGA Based Dual Carrier Modulation Soft Mapper and Demapper for the MB-OFDM UWB Platform Runfeng Yang, R. Simon Sherratt, Oswaldo Cadenas Email: {r.yang, r.s.sherratt, o.cadenas} @reading.ac.uk Signal Processing Laboratory The University of Reading, UK Abstract — In this paper a Dual Carrier Modulation (DCM) soft mapper and demapper are presented, optimized toward the WiMedia consortium Ultra-Wide Band (UWB) radio platform (ECMA-368). The DCM demapper exploits a Channel-StateInformation (CSI) aided scheme coupled with the band hopping information resulting in improved performance compared to previous systems. The proposed mapper and demapper are presented in the context of ECMA-368 as the chosen physical implementation for high data rate Wireless USB (W-USB) and Fast Bluetooth. Implementation aspects of the DCM soft mapper and demapper on Xilinx field programmable gate array (FPGA) are addressed. The DCM soft mapper is implemented with logic minimization to aid the design size. The DCM soft demapper is implemented by computing a dedicated DCM demapping algorithm to output accurate demapping performance and symbol reliability. The design based on Verilog code is efficiently synthesized on 90-nm Copper CMOS process Xilinx Virtex-4 FX FPGA 1 . Index Terms — MB-OFDM, WiMedia, DCM, CSI, FPGA, Verilog.
I. INTRODUCTION In 2005 the WiMedia Alliance [1] working with ECMA International announced the establishment of the WiMedia MB-OFDM (Multiband Orthogonal Frequency Division Multiplexing) UWB (Ultra-Wide Band) radio platform as their global UWB standard. ECMA-368 [2], published from WiMedia Alliance in December 2005, is their first version of UWB Physical (PHY) Layer and Medium Access Control (MAC) layer standard based heavily on the IEEE 802.15.3a MB-OFDM proposal by Batra et al [3]. ECMA-368 is also chosen as the radio platform of high data rate wireless specifications for high-speed Wireless USB (W-USB) and Fast Bluetooth. To transmit a Packet Service Data Unit (PSDU) with preceding training sequences and Packet Layer Convergence Protocol (PLCP) frame, ECMA-368 has eight transmission modes by applying various levels of coding and diversity. Frequency-domain spreading, time-domain spreading, and Forward Error Correction (FEC) coding are used to offer 53.3,
80, 106.7, 160, 200, 320, 400 or 480 Mbit/s to the MAC layer. After bit interleaving, the coded and interleaved binary data sequence are mapped onto a complex constellation. Two complex constellation schemes, Quadrature Phase Shift Keying (QPSK) and Dual Carrier Modulation (DCM), are adopted as the mapping techniques in ECMA-368. DCM [4] was introduced to the MB-OFDM proposal by Batra et al [3] as one of the enhancement changes to create the current WiMedia Alliance standard [2]. For data rates 200 Mbit/s and lower, a QPSK constellation is used. For data rates 320 Mbit/s and higher, DCM is used as a multi-dimensional constellation. Fig. 1 and 2 depict the encoding and decoding process for the coded PSDU respectively. The PSDU data rate-dependent DCM parameters are listed in Table 1. For all the DCM modes, the interleaver operates over 1200 bits but the DCM operation is on successive blocks of 200 bits. As will be seen later, 200 bits are mapped into 100 complex symbols for transportation inside one 128pt OFDM frame. In the MB-OFDM system, Channel State Information (CSI) has been previously used to enhance the channel decoder’s error correction performance [5]. Least Square (LS) equalization is one of the popular equalization methods for an OFDM based system and has low complexity to compute the CSI. A CSI aided scheme coupled with the band hopping information maximizes the soft demapping performance [6] in the presence of narrow-band interferers. This paper discusses an FPGA based design and implementation of a DCM soft mapper and demapper suitable for ECMA-386. Chapter II presents the clock control for the DCM. Chapter III presents the DCM mapping and its FPGA Coded PSDU
This work was supported in part by The University of Reading Overseas Research Postgraduate Studentships.
ISBN: 1-9025-6016-7 © 2007 PGNet
QPSK or DCM Mapper
IFFT
YT ( n )
Figure 1. Encoding process for the coded PSDU at the Transmitter CSI
Deinterleaved PSDU
1
Bit Interleaver
Bit Deinterleaver
QPSK or DCM Demapper
Channel Equalize
FFT
Figure 2. Decoding process for the coded PSDU at the Receiver
YR (n )
Table 1. PSDU rate-dependent Parameters Data Rate (Mb/s)
Modulation
Coding Rate (R)
53.3 80 106.7 160 200 320 400 480
QPSK QPSK QPSK QPSK QPSK DCM DCM DCM
1/3 1/2 1/3 1/2 5/8 1/2 5/8 3/4
Coded Bits / 6 OFDM symbol 300 300 600 600 600 1200 1200 1200
Table 2. DCM Mapping table Info Bits / 6 OFDM symbol 100 150 200 300 375 600 750 900
based design and implementation. Chapter IV presents the DCM soft-demapping with the CSI aided scheme and its FPGA based design and implementation. Chapter V provides the timing and area results of mapping the proposed DCM soft mapper and demapper into FPGA device. Conclusions are given in Chapter VI.
Input Bit b[g(k)], b[g(k)+1], b[g(k)+50], b[g(k)+51]
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
I[k]
Q[k]
-3 -3 -3 -3 -1 -1 -1 -1 1 1 1 1 3 3 3 3
-3 -1 1 3 -3 -1 1 3 -3 -1 1 3 -3 -1 1 3
I[k+50] Q[k+50] 1 1 1 1 -3 -3 -3 -3 3 3 3 3 -1 -1 -1 -1
1 -3 3 -1 1 -3 3 -1 1 -3 3 -1 1 -3 3 -1
II. CLOCK CONTROL As shown in Table 1, the DCM is operated at 3 different specific high data rates. The Virtex-4 can be operated up to 500 MHz. The Virtex-4 Digital Clock Managers (DCMs) (as opposed to DCM) provide a wide range of powerful clock management features: clock desk skew, frequency synthesis, phase shifting, etc. By using these FPGA based DCMs, the global circuit clock can be driven to 320, 400 and 480 MHz as required for the high data rates for DCM modulation. III. DCM MAPPING In DCM constellation mapping, the 1200 interleaved bits are divided into 6 groups of 200 bits, and then further grouped into 50 groups of 4 bits. The each group of 4 bits is represented as (b[g(k)], b[g(k)+1], b[g(k)+50], b[g(k)+51]), where k ∈ [0…49] and
k ∈ [ 0K 24] ⎧ 2k g (k ) = ⎨ ⎩2k + 50 k ∈ [ 25K 49]
(1)
The mapping between bits and complex output symbols is enumerated in Table 2. The two resulting complex numbers (I[k], Q[k]), (I[k+50], Q[k+50]) are allocated into two individual OFDM sub-carriers separated by at least 200 MHz, which offer a frequency diversity gain for robustness against multi-path and interference. The mapped 100 complex symbols are sequentially sent to the 128pt IFFT (Inverse Fast Fourier Transform) block for OFDM. To implement the DCM mapper associated with its mapping table into a limited resource FPGA, an efficient, low cost, high performance solution needs to be considered. The code input data has 4 bits whose values correspond to digits 0, 1, .., 15. Each one of the output data also has 4 bits whose values
correspond to 4 digits -3, -1, 1, 3 only. Simplification of the implementation is required to minimize the logic functions of the input and the output data. The Espresso logic minimizer can help in the design of large digital logic designs [7] [8]. From the Espresso logic minimization the combination logic of the output data consisting of 4 binary bits can be obtained as shown in (2) and (3).
I (or Q ) = {bit 3, bit 2, bit1, bit 0}
{ Q[k ] = { b[g (k ) + 50],
(2)
I [k ] = b[g (k )], b[g (k )], b[g (k ) + 1], b[g (k )] + b[g (k )]
{ Q[k ] = { b[g (k ) + 51],
}
b[g (k ) + 50], b[g (k ) + 51], b[g (k )] + b[g (k )]
}
} b[g (k )] + b[g (k )] }
I [k ] = b[g (k ) + 1], b[g (k ) + 1], b[g (k )], b[g (k )] + b[g (k )] b[g (k ) + 51], b[g (k ) + 50],
(3) The proposed DCM soft mapper circuitry contains a Finite State Machine (FSM) and a mapping block, as depicted in Fig 3. Some blocks are not drawn in the Mapper in aiming to simplify the diagram. The Mapper consists of a counter, which counts between 0…99, and some combinational logic. The FSM controller has 4 states. All blocks are controlled by the FSM controller in the circuitry at each state. The DCM soft mapper only starts working after the last bit of 1200 in the interleaver is performed. Then the mapper maps 200 bits at each time, a total of six times. The 200 bits are mapped into 100 complex symbols within 50 clock cycles, meanwhile it outputs the first 50 complex symbols, and stores the last 50 complex symbols. For the next 50 clock cycles, the mapper only outputs the last 50 complex symbols.
I
Input Bits Mapper
I
Q Q
DCMs 320MHz (or 400MHz) (or 480MHz)
50 x 8 Single Port RAM 1 50 x 8 Single Port RAM 2
50 x 16 Dual Port RAM1 Demapper
FSM Controller
CSI
IV. DCM DEMAPPING
The DCM demapper is present in the receiver chain and performs demapping the received complex numbers, related to two different sub-carriers, back to a group of 4 bits, and then outputs groups of 200 soft-bits. The proposed DCM softdemapper employs a related matrix factor to combine the two received complex numbers previously transmitted on different sub-carriers into Maximum Likelihood (ML) soft bits [6]. In addition, the proposed DCM demapper exploits a CSI aided scheme coupled with the band hopping information to help demapping more accurate soft bits [6]. Each OFDM sub-carrier position has a dynamic estimation for the data reliability. This dynamic estimation in the frequency-domain is defines as Channel State information (CSI). Each data carrier has a potentially different CSI based on the power of the channel estimate at that frequency. Therefore the more CSI measurements that can be taken the more reliable the CSI estimation is in the presence of thermal noise to offer a better demapping result. The proposed DCM demapping algorithm is:
b1 = (2 I R n + I R n + 50 ) ∗ CSI b2 = ( I R n − 2 I R n + 50 ) ∗ CSI b3 = (2QR n + QR n + 50 ) ∗ CSI
(4)
50x 16 Dual Port RAM4
600 x 8 Dual Port RAM
DCMs 320MHz (or 400MHz) (or 480MHz)
4 to 1 Soft bit MUX
50x 16 Dual Port RAM3
Comparator
Figure 3. FPGA based DCM soft Mapper block diagram
50x 16 Dual Port RAM2
FSM Controller
Figure 4. FPGA based DCM soft demapper block diagram
Single Port RAMs and 5 Dual Port RAMs at each state. The FSM containing 7 states is related to a counter, which counts between 0, 1… and 275. The DCM soft demapper circuitry receives and stores 50 complex symbols in “50 x 8 Single Port RAM1” and “50 x 8 Single Port RAM2” respectively all done within 50 clock cycles. Then the next 25 incoming I and Q are computed with the first 25 stored I and Q from the RAMs and the corresponded CSI from “600 x 8 Dual Port RAM”. While the last 25 incoming complex symbols are computed with the last 25 stored complex symbols and the corresponded CSI, the first 25 soft bits computed and stored in “50 x 16 Dual Port RAM1” are being outputted. All the computations of the demapping and sending the first 25 soft bits are done within 50 clock cycles. The last stage is only to output the 175 soft-bits from the dual port RAMs within 175 clock cycles.
b4 = (QR n − 2QR n + 50 ) ∗ CSI
V. MAPPING DCM MAPPER AND DEMAPPER INTO FPGA
The receiver converts each time-domain OFDM symbol into the frequency-domain via the Fast Fourier Transform (FFT). The DCM soft demapper demapps the two received and equalized complex number back to b1, b2, b3, b4 as a group of 4 bits. The CSI is produced from LS equalization based on the Channel Estimation (CE) at the receiver [6]. There are 600 CSIs covering the 100 symbols over the 6 channel estimation frames and are repeated to use for the computation until a new CE is obtained. As in DCM, one constellation point is related to two different OFDM sub-carriers allocated separately by 200 MHz, so the frequency diversity can lead two different CSIs (CSIn, CSIn+50) associated with its data carrier. The smaller CSI is chosen as the more reliable CSI to aid the demapping. The proposed DCM soft demapper circuitry depicted in Fig. 4 contains a FSM controller to control a demapping block, 2
The targeted hardware to implement the circuitry was a Xilinx Virtex 4 FPGA. The Virtex 4 family offers enhanced complete solutions for telecommunication, wireless, networking and DSP applications [9]. The device used for this design is XC4VFX12. The FX is optimized for embedded processing and high-speed serial connectivity. There are 1 million gates on the device including 5472 slices, 32 Xtreme DSP slices and 36 RAM blocks (each RAM block is 18 Kbits) [9]. The internal clock speed on the device can go up to 500 MHz, which satisfies the needs of the DCM soft mapper and demapper circuitry. Verilog code is used for implementation. The design is synthesized and mapped by using Xilinx Navigator 8.2i into the FPGA device. After the mapping and routing phase, the result for area is summarized in Table 3. Since the clock speed used in the system is 320, 400 or 480 MHz, the synthesized circuitry meets the timing constraints.
Table 3. Area report for DCM soft Mapper and Demapper circuitry
Mapper Demapper
Slices
DSP48
433 1448
4
Total Gates 6887 24410
For the Mapper side, the circuitry needs 101 clock cycles to extract the 100 complex symbols. For the Demapper side, the circuitry needs 276 clock cycles to extract the 200 soft bits. VI. CONCLUSION
Dual Carrier Modulation is adopted as a modulation scheme in the MB-OFDM UWB platform when the data rate is over 320 Mbit/s. A FPGA based DCM soft mapper and demapper has been designed and implemented as proposed. The implementation of DCM soft mapper has been optimized by using Espresso logic minimization. The dedicated demapping algorithm resulting in improved performance in high data rate transmission, particularly in the 480 Mb/s, not only employs related a matrix factor to extract the soft bit, but also exploits LS equalized CSI to robust the accuracy of the soft bits. The DCM soft demapper has been implemented by computing this demapping algorithm. The proposed DCM soft mapper and demapper circuitries occupy totally 1891 slices on FPGA, which is almost equal to 31297 gates. An efficient, low cost, high performance solution is always concerned on FPGA implementation, so logic and design size minimization are required. The FPGA design for the DCM demapper is implemented on 24410 gates. In this case, further optimization for the DCM demapper is needed. Currently the DCM mapper and demapper work individually. The issues of synchronization are needed when they are assembled into an entire system. All these will be the future work.
REFERENCES [1] [2]
[3]
[4]
[5]
[6]
[7] [8] [9]
WiMedia Alliance http://www.wimedia.org/en/index.asp ECMA-368, “High Rate Ultra Wideband PHY and MAC Standard”, December 2005, http://www.ecma-international.org/publications/files/ECMA-ST/ECMA368.pdf A. Batra, et al, “Multi-band OFDM physical layer proposal for IEEE 802.15 task group 3a”, IEEE standard proposal P802.15-03, March 2004, http://grouper.ieee.org/groups/802/15/pub/2003/Jul03/03268r3P80215_TG3a-Multi-band-CFP-Document.doc A. Batra and J. Balakrishnan, “Improvements to the Multi-band OFDM Physical Layer”, 3rd IEEE Consumer Communications and Networking Conference, Volume 2, 8-10 Jan. 2006, pp. 701-705. W. Li, Z. Wang, Y. Yan, M. Tomisawa “An efficient low-cost LS equalization in COFDM based UWB systems by utilizing channel-stateinformation (CSI),” Vehicular Technology Conference, 2005. VTC-2005Fall. 2005 IEEE 62nd, Sept, 2005, Vol. 4, pp: 67- 71 R. Yang, R. S. Sherratt, “An Improved DCM Soft-Demapper for the MBOFDM UWB Platform Exploiting Channel-State-Information”, IEEE/IET Signal Processing for Wireless Communications, London, 6-8 June 2007, in press. Espresso Tutorial http://www.csl.cornell.edu/courses/eecs314/espressotutorial.pdf ESPRESSO: Logic Minimization Software http://diamond.gem.valpo.edu/~dhart/ece110/espresso/tutorial.html Xilinx Inc. Online technical documentation, “Virtex-4 Family Overview” and “Virtex-4 User Guide”, http://www.xilinx.com/support/library.htm