Document not found! Please try again

Design and Implementation of Advanced Algorithms

38 downloads 0 Views 522KB Size Report
on MIMO signal processing in MATLAB and an offline ... In addition, modular FPGA platforms ..... using the VHDL language and Xilinx System Generator. The.
2011 IEEE International Conference on Ultra-Wideband (ICUWB)

Design and Implementation of Advanced Algorithms for MIMO-UWB Wireless Communications Emil Dimitrov, Claus Kupferschmidt, Thomas Kaiser Juha Korpi, Risto Nordman, Antti Anttonen Andrea Giorgetti, and Marco Chiani

Abstract—This paper describes the design and implementation of low-complexity multiple-input multiple-output (MIMO) signal processing techniques suitable for ultra-wideband (UWB) wireless communications. It provides the general system concept, description of the measurement set-up and modular FPGA platform for test and verification of the proposed multiple antenna schemes under real air-interface environment. We validate the performance of selected spatial multiplexing and diversity approaches applied to Multi-Band Orthogonal Frequency Division Multiplexing (MB-OFDM) and demonstrate the enhancements in capacity and link reliability of existing WiMedia devices through algorithm- and implementation-level investigations.

I. I NTRODUCTION

General concept of a MIMO MB-OFDM system.

II. G ENERAL C ONCEPT

MIMO systems have been widely considered a viable solution to overcome the current limits in wireless communication. The application of UWB to indoor environments, with the rich energy scattering, provides an ideal scenario for MIMO. By enclosing additional degree of freedom for communication, multiple antennas can effectively turn multipath propagation, considered initially a drawback in wireless communications, into an advantage so as to increase the capacity of the system, or improve its coverage and robustness in terms of error probability. While the benefits of MIMO have been widely studied and exploited in broadband MIMO-OFDM wireless communications, the field of real-world implementation and verification of multiple antenna UWB solutions is still unexplored. In this paper, we have therefore focused on developing and investigating suitable MIMO approaches from the perspective of low-complexity algorithm design and implementationlevel constraints to allow for enhancements in data rate, link reliability and range extension of the current ECMA-368 standard [1]. The proposed schemes are analyzed from the theoretical point of view, and then, they are implemented in an offline MIMO-UWB test-bed integrated with an FPGA platform for test and verification. Finally, we show measured performance and complexity results. E. Dimitrov, C. Kupferschmidt, and T. Kaiser are with Leibniz University of Hanover, IKT, Hannover, Germany, Email: {emil.dimitrov, kupfers, thomas.kaiser}@ikt.uni-hannover.de J. Korpi, R. Nordman, and A. Anttonen are with VTT Technical Research Centre of Finland, Oulu, Finland. Email: {Juha.Korpi, Risto.Nordman, Antti.Anttonen}@vtt.fi A. Giorgetti and M. Chiani are with DEIS, WiLAB, University of Bologna, Cesena ITALY Email: {andrea.giorgetti, marco.chiani}@unibo.it This research was supported, in part, by the European Commission under the FP7 ICT integrated project CoExisting Short Range Radio by Advanced Ultra-WideBand Radio Technology (EUWB), Grant no. 215669.

978-1-4577-1764-2/11/$26.00 ©2011 IEEE

Fig. 1.

OF A

MIMO MB-OFDM S YSTEM

The general concept of a MIMO-UWB wireless system is based on a WiMedia MB-OFDM conform data-path simulator featuring multiple antennas on both sides of the communication link. An example of such a system concept based on MIMO signal processing in MATLAB and an offline MIMO-UWB test-bed is shown in Figure 1. In case of spatial multiplexing, multiple independent data streams are generated in the transmitter and uploaded to arbitrary waveform generators (AWGs) in order to increase the spectral efficiency of the system. The AWG directly converts the input I/Q streams to IF/RF signals to be synchronously transmitted over the channel. These signals are then acquired by up to four receive antennas, band-pass filtered, amplified, sampled and transferred back to MATLAB for further processing. Various receivers and MIMO diversity schemes can thus be easily implemented. In addition, modular FPGA platforms featuring designated MIMO algorithms can be integrated into the existing MATLAB chain. III. L OW- COMPLEXITY MIMO-UWB A LGORITHMS I NCREASED C APACITY AND L INK R ELIABILITY

FOR

In the following we consider a NR × NT discrete complex MIMO signal model in the frequency domain, where the received signal at a certain frequency tone is given by1 Y = HS + W

(1)

with S = [S1 , . . . , SNT ]T , Y = [Y1 , . . . , YNR ]T and W being the transmitted-, received tone vector and zero-mean complex 1 Throughout the paper vectors and matrices are indicated by bold, a i,j is the (i, j)th element of A, the expectation operator is denoted by E {·}, the superscript ()T denotes transposition, the superscript ()H denotes conjugation and transposition, and IN is a N × N identity matrix.

288

white Gaussian noise with variance σ 2 INR , respectively. The MIMO channel matrix H ∈ CNR ×NT at a given frequency tone k can be considered as frequency flat and has as elements the frequency responses Hi,j (k, m) of the sub-paths between the jth transmit antenna and the ith receive antenna at a symbol index m. A. Ordered SIC with MMSE-based Sorted QR-Decomposition (MMSE-SQRD) This section describes a computationally efficient algorithm for MIMO detection in V-BLAST architectures with respect to the MMSE criterion. It utilizes a sorted QR decomposition of the estimated channel matrix by successive interference cancellation. Unlike linear receivers such as ZF and MMSE, the performance of the SIC is dominated by the stream which is detected first and thus an error propagation has considerable impact on the detection quality of the subsequent streams. An ordered SIC starts rearranging the columns of the channel matrix by attempting to identify the stream that is most likely to yield no detection errors in the first iterations. Although the ordering of the original V-BLAST algorithm [2] is optimum with respect to BER performance, it still requires multiple calculations of the pseudo-inverse, being a computationally expensive task. The MMSE extension of V-BLAST based on sorted QR decomposition has been proposed in [3] to reduce the overall complexity to the order of linear receivers with negligible degradation in terms of BER. To perform QR decomposition with respect to the MMSE criterion, the channel matrix H and received signal Y are extended as    ˘   Q1 R H Y ˘ ˘ ˘ ˘ H= = ˘ = QR and Y = 0 , (2) σ 2 INT Q2 R ˘ and R ˘ being a unitary and upper triangular matrix with Q H ˘ respectively. With the relation QH 1 H = R − σQ2 , the least˘ ˘ ˘ square solution of Y = HS + W can be obtained by backsubstitution solving H H ˘ QH 1 Y = RS + Q1 W − σQ2 S.

(3)

The last term in (3) constitutes the remaining interference that cannot be removed by successive interference cancellation. The optimum detection sequence now maximizes the SINR on each layer, yielding minimum estimation error at each iteration. The most computationally expensive part of the presented algorithm is the QR decomposition of the ˘ It is well known that unitary extended channel matrix H. transformations based methods (e.g. Givens rotations) are more numerically stable than the Gram-Schmidt procedure [4]. However, when one considers FPGA based implementation, multiplication-intensive algorithms such as the modified Gram-Schmidt QR decomposition in [5] are usually more desirable than CORDIC-intensive Givens rotations. The sorted version of the MMSE detection is basically an extension of the modified Gram-Schmidt procedure by re-ordering the columns of the channel matrix prior to each orthogonalization step. The basic idea is that the diagonal elements of the extended ˘ are minimized in the order NT ×NT upper triangular matrix R they are computed (1, . . . , NT ). Starting with the norm of the

˘ the rows of R ˘ are successively determined first column of H, until the extended channel matrix is transformed into a unitary ˘ Note that the column ordered (NT + NR ) × NT matrix Q. norms have to be calculated only once in the beginning and can be easily updated afterwards, resulting in less computational overhead due to sorting. In fact, the MMSE-SQRD does not necessarily lead to the optimum detection sequence, but in many cases of interest the performance degradation is negligible as compared to the reduced complexity. B. Orthogonal 4 × 2 MIMO Space-Time-Frequency Codec In the 4 × 2 MIMO space-time-frequency coding (STFC) transmission scheme we have four transmit and two receive antennas. The MIMO coding concerns frequency domain symbols and the channel is modelled according to frequency responses, which means implicit OFDM processing at both transmitter and receiver. Our STFC approach is based on the orthogonal code first introduced in [6]. We used a slightly more refined version of it given in [7]. The generators are equivalent, but the mathematical presentation of [7] has the more concise form involving only signal constellation points and their complex conjugates. The generator matrix of the code [7] is   S1 −S2∗ −S3 0  S2 S1∗ 0 −S3∗  , S= ∗  S3 S2∗  0 S1 0 S3 −S2 S1 where [S1 , S2 , S3 ] are the three information bearing constellation points. The rows of S are transmitted simultaneously from four antennas in four successive instants. Basically, the successiveness can occur in time or frequency depending on the stability of the channel along given dimension. In our case the channel is frequency selective but relatively stable in time; hence the MIMO coding is applied over four consecutive OFDM symbols m, . . . , m + 3 along each subtone separately. According to the signal model, H is the 4 × 2 channel frequency response and W is the 4 × 2 noise matrix conforming to the assumption of stability in time, i.e., Hi,j (m) ≈ Hi,j (m + 1) ≈ Hi,j (m + 2) ≈ Hi,j (m + 3). The MIMO decoding and combining is done at the same time by applying the equation Sˆk = P2

j=1

where Yˆ1 =

2 X

Yˆk P4

i=1

∗ H Hi,j i,j

k = 1, 2, 3

(4)

∗ ∗ Yj(m)H1,j +Yj∗(m+1)H2,j+Yj∗(m+2)H3,j+Yj(m+3)H4,j

j=1

Yˆ2 =

2 X

Yˆ3 =

2 X

∗ ∗ Yj(m)H2,j −Yj∗(m+1)H1,j−Yj(m+2)H4,j +Yj∗(m+3)H3,j

j=1 ∗ ∗ Yj(m)H3,j +Yj(m+1)H4,j −Yj∗(m+2)H1,j−Yj∗(m+3)H2,j

j=1

(5)

For brevity, the OFDM symbol index m is only marked onto Yj . For zero mean uncorrelated noise, equations (5) constitute

289

0

10

Alamouti 2x2 OSTBC 4x2 -1

10

-2

10

-3

BEP

unbiased minimum variance estimators of the transmitted symbols, which can be verified by substituting the expressions of the received symbols (4) into them. Hence, (5) can be considered ideal soft decisions on the transmitted constellation points. We note that two receive antennas is just a test case. The signal model as well as the decoding and combining equations can easily be extended to cover any number of receive antennas. All that is needed is bringing the appropriate rows into matrices H, Y, and W and extending the summations over the correct number of receive antennas in (5). In order to obtain low bit error rates at the convolutional decoder it is necessary to carry the soft decisions to the decoder input via the demodulator. We extracted the bitwise logarithm of likelihood ratio (LLR) from the estimates (5), a measure that contains all available information and is directly applicable to the Viterbi decoding algorithm. We stipulate the Graymapping of the bit vectors onto the constellation points, which maximises the sum of obtained LLR values. The complexity was alleviated by only using two points, the closest for bit one and the closest for bit zero, which gives optimum results for QPSK and very close to optimum for 16-QAM.

-4

10

-5

10

-6

10

-7

10

(6)

From this expression we see that the Alamouti’s scheme uses the full 2 · NR diversity available. In this scheme, specifically designed for NT = 2, two symbols are transmitted in two symbol periods. In this regard, the encoder used by the Alamouti’s scheme is rate 1. As described in the previous section, for NT = 4 orthogonal codes can be designed with rates 3/4, i.e., where 3 symbols are transmitted over 4T (OSTBC). Thus, it can be shown that the 4 × NR MIMO with OSTBC is equivalent to a SISO channel,

5

10

15

20

25

Fig. 2. BEP for uncoded Alamouti’s and OSTBC schemes with M-QAM. From left to right: M = 4, 16, 64, 256. The corresponding information rates are 2, 4, 6, 8 [bit/s/Hz] for the (2 × 2) Alamouti’s scheme and 1.5, 3, 4.5, 6 [bit/s/Hz] for the (4 × 2) OSTBC scheme.

with SNR given by SNROST BC =

The theoretical analysis of the MIMO systems described before includes capacity outage probability and bit error probability (BEP), assuming Rayleigh  fading channels with normalized average gains, i.e., E |hi,j |2 = 1 ∀i, j. With this normalization, the signal-to-noise ratio (SNR) defined as the ratio of the total transmitted power, Pt , and the noise power per antenna, σ 2 , is SNR = σP2t . It represents the SNR measurable at each receive antenna element. If the total transmission rate is R [information b/s/Hz], the energy transmitted per information bit is Eb = Pt T /R where T is the signaling period. We can Pt Eb SNR = Rσ then define the ratio N 2 = R where N0 is the 0 thermal noise single sided power spectral density. Note that Eb represents the average energy per bit at each receive antenna, due to the normalization imposed on the channel gains. The 2 × NR multiple-input multiple-output (MIMO) system with the appropriate processing of the Alamouti’s scheme can be interpreted as a single-input single-output (SISO) system [8]. In particular, all modulation and coding/decoding techniques for the SISO channel can be applied. The virtual SISO channel obtained by the Alamouti’s scheme has output SNR [8] NR X 2 Pt Pt X 2 |hi,j |2 ||H|| = F 2σ 2 2σ 2 i=1 j=1

0

Eb/N0 [dB]

IV. T HEORETICAL A NALYSIS

SNRAla =

10

NR X 4 Pt X Pt 2 |hi,j |2 ||H|| = F 3σ 2 3σ 2 i=1 j=1

(7)

where the factor 3 in the SNR is due to the fact that in each interval T , 3 symbols are transmitted, so Pt = 3Es where Es is the energy per transmitted symbols. The end-to-end capacity of the equivalent SISO obtained from the 2 × NR MIMO with the Alamouti’s scheme is   Pt 2 C(H) = log2 1 + 2 ||H||F [info b/s/Hz] (8) 2σ while for OSTBC where 3 symbols are transmitted in 4T , it becomes   3 Pt [info b/s/Hz] (9) C(H) = log2 1 + 2 ||H||2F 4 3σ These expressions will be used later to evaluate the theoretical performance of MIMO with powerful (near capacity) error correcting codes. A. Performance of uncoded Alamouti’s and OSTBC schemes The above schemes can be used with coding and modulation designed for SISO channels, so their performance are those of SISO systems, with the SNR given by (6) or (7). In Fig. 2 we report the BEP for uncoded Alamouti’s and OSTBC scheme with M -ary quadrature amplitude modulation (M-QAM) over a (2 × 2) and (4 × 2) MIMO Rayleigh uncorrelated fading channels, obtained by the formulas in [9]. From the curves we can see that, for a given spectral efficiency, the 4 × 2 OSTBC scheme is to be preferred for large SNR’s. This comparison will be also made in the following section assuming powerful error correcting codes. B. Performance of coded Alamouti’s and OSTBC schemes We assume a block fading channel (BFC), where the channel is constant over a block composed of several symbols, and takes independent, identically distributed (i.i.d.) values across different blocks.

290

0

10

2x2 Alamouti 2 b/s/Hz 4x2 OSTBC 2 b/s/Hz 2x2 Alamouti 1 b/s/Hz 4x2 OSTBC 1 b/s/Hz Alamouti 2 b/s/Hz, perf. interl. Almouti 1 b/s/Hz, perf. interl.

-1

10

-2

Pout

10

-3

10

-4

10

-5

10

Fig. 4.

-6

10

-5

0

5

10

15

20

The experimental testbed of a MIMO MB-OFDM system.

Eb/N0 [dB]

Fig. 3. Example of outage probability for R = 1, 2 [info b/s/Hz], no CSIT, MIMO (NT × NR ) uncorrelated Rayleigh channel.

For MIMO channels without CSIT the ergodic capacity is defined as E {C(H)} where C(H) is given in (8) and (9), and the expectation is taken respect to the distribution of H. This ergodic capacity is valid for systems without CSIT with coding and perfect time-interleaving (fast fading). For MIMO channels without CSIT, when the codeword length is of the order of the fading block (slow fading), and assuming a constant information rate R [information b/s/Hz] at the channel input, we define the channel outage probability as Pout (R) = Pr {C(H) < R} (10) which represents a lower bound on the codeword error rate for coded systems. In other words, coded systems with powerful codes and perfect interleaving will approach the ergodic capacity E {C(H)}, in the sense that for R < E {C(H)} the codeword error rate (CER) is zero, and it is one viceversa; on the other side, for slow fading and no interleaving the CER is lower bounded by the outage probability (10). The outage probability vs. SNR curves for MIMO uncorrelated Rayleigh channels and with spectral efficiencies R = 1 and R = 2 [b/s/Hz] are reported in Fig. 3. We can note that the 4 × 2 OSTBC scheme provides a diversity 82 , while the 2 × 2 Alamouti’s scheme has diversity 4. Also, the 4 × 2 OSTBC scheme has the best performance for the error probability range of interest. V. E XPERIMENTAL O FFLINE MIMO-UWB T EST- BED AND FPGA P LATFORM In order to examine the performance of the proposed MIMO algorithms under realistic conditions and with minimum effort, a reconfigurable offline multi-antenna test-bed has been set up. Depending on the specific application scenario, the selected MIMO scheme can thus be developed under the common MATLAB environment, easily uploaded to the MIMO testbed, transmitted over the air, and finally verified under the effects of real-world phenomena. While this approach preserves the flexibility and modularity of algorithm development, 2 The

curves have slope of 10/8 [dB/decade].

it allows for extensive research on and evaluation of the achievable performance gains by employing multiple antennas under realistic propagation conditions. In this set-up, the transmitter is composed of two signal generators Tektronix AWG7102, each one supporting two channels with up to 10 Gsample/s, 5.8 GHz analog bandwidth and 32 Msample waveform length per channel. The AWG enables the direct generation of RF signals, via the DAC, up to the effective frequency of 5.8 GHz and with the capability to add real world signal imperfections. The direct generation of IF or RF signals avoids I/Q degradations and time-consuming adjustments associated with traditional methods using I/Q modulators. On the receiver side, a digital phosphor oscilloscope (Tektronix DPO71604) provides 4 channels with up to 50 Gsample/s per channel and 16 GHz analog frequency span. Among its time-, spectral and PSD testing capabilities, the DPO has been mainly utilized to reliably capture the transmitted MIMO waveforms for further offline processing and algorithm verification. Regarding the FPGA platform, Xilinx ML605 board [10] was selected for the hardware-in-the-loop MATLAB simulations, due to its large and high-speed XC6VLX240T FPGA circuit and convenient interfaces to the host computer. Because the computing with FPGA is considerably faster than with standard processors, the hardware-in-the-loop simulation allows more realistic data rates in the hardware part even though all the other parts of the simulations are running in slower MATLAB. Three MIMO decoders have been implemented to the FPGA platform: 2 × 2 standard MMSE decoder, 3 × 4 ordered SIC decoder with MMSE-SQRD and 4 × 2 STFC decoder with soft LLR demodulator. The selected algorithms have been implemented using the VHDL language and Xilinx System Generator. The design uses hardware-friendly parallel fixed-point logic and deep pipelining to maximize the throughput. VI. E XPERIMENTAL R ESULTS A. Algorithm-level investigations From the theoretical analysis section, we have shown that the OSTBC (4 × 2) MIMO system has, for a given spectral efficiency, a better performance in terms of outage probability than the Alamouti’s scheme. Even if the theoretical analysis applies to powerful error correcting codes, we have verified similar behaviours in MIMO-OFDM transmission systems with standard convolutional codes having memory length of

291

Fig. 5. Comparison of the orthogonal code performance in CM1 channel for QPSK and NR = 2.

unit can calculate, for example, a real multiplication in one clock period. The 3×4 SIC-based decoder is the most complex one, mainly because of the largest amount of MIMO antennas. The FPGA resource consumption of the 2 × 2 MMSE and 4 × 2 STFC decoders are quite close to each other but this is explained by the usual lower complexity of STFC decoders where no matrix inversion is needed. Additionally, the 4 × 2 STFC decoder also calculates the soft LLR values while the MMSE decoder only outputs the detected symbol estimates. The maximum information bit throughput rates in Table VI-B have been calculated for HDR transmission with code rate 3/4 and 16-QAM modulation and assuming input rate of one data subcarrier of each antenna at every clock period with given clock frequency. TABLE I VHDL S YNTHESIS TO THE FPGA Resource Max clock frequency Max throughput LUTs DSP48 blocks Block RAM

2 × 2 MMSE 210 MHz 1260 Mbit/s 12551 (8%) 137 (17%) 29 (6%)

3 × 4 SIC 160 MHz 1440 Mbit/s 48560 (31%) 236 (30%) 185 (44%)

4 × 2 STFC 160 MHz 360 Mbit/s 14318 (9%) 122 (15%) 20 (4%)

VII. C ONCLUSION

Fig. 6. Comparison of the orthogonal code performance in CM1 channel for 16-QAM and NR = 2.

In this paper we have described some novel MIMO signal processing methods suitable for UWB communications by reducing the complexity of the spatial demultiplexing, maximizing the available soft information at the input of the channel decoder. A comprehensive hybrid test-bed has been developed with a real air-interface environment and selected MIMO methods have been validated using an FPGA board. R EFERENCES

six. In particular, the channel was estimated from training symbols that were transmitted in the beginning of each frame; it was kept constant during the OFDM frames and changed at the beginning of a new frame. The correlation between the antenna links was generated by a simplified model reported in [11]. With the Alamouti code, we applied the convolutional code rate of 1/2, whereas with the orthogonal code for four transmit antennas it was punctured to the rate 3/4 leading to the total rate 9/16. Some simulation results are presented in Figures 5 and 6, where the additional gain over the Alamouti code was 2−3 dB, and it seems to be even larger if very small bit error rates are the matter of concern: this is in perfect agreement with the fact that the two systems have different diversity (see Section IV). It is also notable that the largest gain over the Alamouti code was attained when inaccuracies in channel estimation were included in simulation. B. Implementation-level investigations FPGA synthesis results using Symplify Pro synthesis tool is presented in Table VI-B. All MIMO decoders are relatively complex, consuming more than ten thousand lookup tables (LUTs) and more than one hundred DSP48 units. One DSP48

[1] Ecma International, “High Rate Ultra Wideband PHY and MAC Standard,” Standard ECMA 368, Dec. 2008. [2] G. J. Foschini, “Layered space-time architecture for wireless communication in a fading environment when using multi-element antennas,” Bell Labs Tech. J., vol. 1, no. 2, pp. 41–59, 1996. [3] D. Wubben, R. Bohnke, V. Kuhn, and K. D. Kammeyer, “MMSE extension of V-BLAST based on sorted QR decomposition,” in Proc. IEEE Veh. Tech. Conf. (VTC 2003-Fall), vol. 1, 2003, pp. 508–512. [4] A. Burg, “VLSI circuits for MIMO communication systems,” Ph.D. dissertation, ETH, 2006. [5] P. Luethi, C. Studer, S. Duetsch, E. Zgraggen, H. Kaeslin, N. Felber, and W. Fichtner, “Gram-Schmidt-based QR decomposition for MIMO detection: VLSI implementation and comparison,” in Proc. IEEE Asia Pacific Conf. on Circ. and Sys. (APCCAS 2008), 2008, pp. 830–833. [6] V. Tarokh, H. Jafarkhani, and A. R. Calderbank, “Space-time block codes from orthogonal designs,” IEEE Trans. on Inf. Theory, vol. 45, no. 5, pp. 1456–1467, 1999. [7] G. B. Giannakis, Z. Liu, X. Ma, and S. Zhou, Space Time Coding for Broadband Wireless Communications. Wiley-Interscience, 2003. [8] M. Chiani, The Communications Handbook. CRC Press, 2011, ch. MIMO Systems. [9] A. Conti, M. Chiani, and M. Z. Win, “Slow adaptive M-QAM with diversity in fast fading and shadowing,” IEEE Trans. Commun., vol. 55, no. 5, pp. 895–905, May 2007. [10] [Online]. Available: http://www.xilinx.com/products/devkits/EK-V6ML605-G.htm [11] J. Adeane, W. Q. Malik, I. J. Wassell, and D. J. Edwards, “Simple correlated channel model for ultrawideband multiple-input multipleoutput systems,” IET Microwaves, Antennas & Propagation, vol. 1, no. 6, pp. 1177–1181, 2007.

292

Suggest Documents