Algorithm and VLSI Architecture for Linear MMSE Detection in MIMO ...

Algorithm and VLSI Architecture for Linear MMSE Detection in MIMO-OFDM Systems A. Burg, S. Haene, D. Perels, P. Luethi, N. Felber and W. Fichtner Integrated Systems Laboratory, ETH Zurich, Switzerland { apburg,haene,perels,luethi,felber,fw } @iis.ee.ethz.ch Abstract- The paper describes an algorithm and a corresponding VLSI architecture for the implementation of linear MMSE detection in packet-based MIMO-OFDM communication systems. The advantages of the presented receiver architecture are low latency, high-throughput, and efficient resource utilization, since the hardware required for the computation of the MMSE estimators is reused for the detection. The algorithm also supports the extraction of soft information for channel decoding. I. INTRODUCTION

Multiple-input multiple-output (MIMO) wireless communication systems [1] employ multiple antennas at the transmitter and at the receiver to increase system capacity and to achieve better quality of service. In spatial multiplexing mode, MIMO systems reach higher peak data rates without increasing the bandwidth of the system by transmitting multiple data streams in parallel in the same frequency band. Orthogonal frequency division multiplexing (OFDM) is a modulation scheme that is robust against interference arising from multipath propagation. Consequently, many upcoming standards for high throughput wireless communication such as IEEE 802.1 in and IEEE 802.16 rely on a combination of MIMO with OFDM. Unfortunately, the performance improvements of MIMO technology also entail a considerable increase in signal processing complexity, in particular for the separation of the parallel data streams. Hence, a major challenge associated with the implementation of future wireless communication systems is in the design of low-complexity MIMO detection algorithms and corresponding VLSI architectures. In this work, we consider the VLSI implementation of linear MMSE detection for wideband MIMO-OFDM systems. A suboptimal linear detection scheme is contemplated since the implementation of algorithms with better performance (e.g., [2], [3], [4]) either do not meet the high throughput requirements for MIMO-WLAN (especially not on FPGAs) or lack the ability to provide soft-information for channel decoding with low hardware complexity. A. System Model and Requirements The system under consideration is a packet-based MIMOOFDM system wtth MT transmit and MR recetve antennas.

0-7803-9390-2/06/$20.00~~~lem ©2006 IEEEn

Data frame

Dtat

Idle

Idle

MIMO detectioni

Detection latency

Fig. 1. Timing diagram of MIMO detection process in packet-based MIMOOFDM systems.

time index t on the kth tone of the OFDM signal. After proper OFDM modulation at the transmitter and demodulation at the receiver, the corresponding received vector y[k, t] is given by y[k, t]= H[k]s[k, t] + n[k, t], (1)

where the MR X MT-dimensional matrix H[k] describes the effective MIMO channel for the kth tone and the vector n[k, t] models the thermal noise in the system as i.i.d. proper complex Gaussian with variance (Y per complex dimension. Assuming knowledge of the channel matrices, the linear MMSE estimator for each tone is given by

G[k] = (HH [k]H[k] +MT

2I) l HH[k]

(2)

and linear MIMO detection corresponds to a straightforward matrix-vector multiplication according to s[k,t] G[k]y[k,t] (3) followed by quantization of the entries of s[k, t] to the nearest constellation point. The difficulty in the implementation of linear receivers for packet-based MIMO-OFDM systems arises from the frame structure because the initial training phase, during which the receiver obtains knowledge of H[k], is immediately followed by data. Since the detection of the data according to (3) only starts when the MMSE estimators for all K data carrying tones have been computed, the delay incurred by the preprocessing according to (2) translates directly into detection latency as illustrated in Fig. 1. In MIMO-OFDM receiver implementations [5], this latency is responsible for considerable memory to buffer the received vectors and can cause probrequirements par than

4102emnt

Authorized licensed use limited to: Texas A M University. Downloaded on March 24, 2009 at 03:08 from IEEE Xplore. Restrictions apply.

th-eA

acsscnto ISCA 2006du

of packet-based MIMO-OFDM receivers. However, it is also noted that the corresponding operation is only performed once at the start of the frame so that, without special provisions, the potentially costly hardware for the preprocessing will be idle most of the time. Contribution: In this paper an algorithm for efficient toneby-tone linear preprocessing of channel state information in MIMO-OFDM systems is presented, together with a hardwareefficient VLSI architecture for its realization. The described receiver constitutes the basis for the soft-output demapper described in [6] which yields a 5-6 dB gain in terms of signal to noise ratio (SNR) over a hard-decision MMSE decoder. The reported ASIC and FPGA area and performance figures provide reference for the true silicon complexity of linear MMSE receivers for MIMO-OFDM systems. Outline: The next section introduces the algorithm for the computation of the linear MMSE detectors. Section III describes a scalable VLSI architecture for the proposed algorithm. Area and performance figures for ASIC and FPGA implementations are provided in Section IV. Section V concludes the paper.

number of multiplications2 and divisions is given by 5 2 CMult =2MRMT + 5MRM -MT +MT

2T5

CDiv2MR

(6)

In order to map recursion (5) to hardware, its compact mathematical description is expanded as shown in Alg. 1. The operation sequence is designed to reduce the dynamic range of intermediate results and to minimize the number of costly divisions, while keeping the number of multiplications low.

Algorithm 1 Algorithm for computing the MMSE estimator 1l I P(M) for MT6M MR do 2lfrj=I... g =P(j-i)HH 3 S= 1 + Hj (note that S is strictly positive) 4: 5: Se elog25S - 2Sel/ g = 5mg 6: 7: p(j) = p(j-1) - ggH2-Se

8: end for 9: G =P(MR)HH

III. VLSI ARCHITECTURE

II. PREPROCESSING ALGORITHM

The choice of a suitable hardware architecture for the Algorithm choices for the implementation of (2) are either based on QR-decomposition [7] using unitary transformations implementation of Alg. 1 depends on the system specifications or on direct matrix inversion algorithms with conventional and on the available area: The most area efficient solution arithmetic. The main advantages of the QR approach lie in its is a fully decomposed, processor-like architecture. However,

favorable numerical properties in fixed-point implementations such a minimum-area solution cannot meet the low-latency and in the availability of a wide range of regular array archi- requirements of MIMO-OFDM systems. A highly parallel tectures [8], [9] for their implementation. The main arguments architecture achieves higher throughput but suffers signififor direct matrix inversion are the lower number of operations cantly from the fact that data dependencies and the desire compared to QR decomposition and the fact that the matrix for a regular data flow mandate a sequential execution of the (HH [k]H[k] +MTG2I) I is produced as an intermediate result. individual steps in Alg. 1. Since these steps differ significantly In fact, the diagonal entries of this matrix are required for the in the number of required operations, a massively parallel architecture would result in a poor utilization of processing computation of soft-outputs [10], [6]. resources. In a moderately parallel VLSI architecture the The implementation that is described in this paper relies number resources is chosen so that their average tp nAl.1rqieete on direct matrix inversion. The corresponding algorithm iS utlzto.of processing shg.Moto.h in borrowed from the updating procedure of the Kalman gain Hence, choosing Kalman filtering applications. The basic idea is to start from MT or a multiple Of MT multiplications. the trivial inverse Of and to obtain (HHH + MTG2I) 1 an MT-fold degree of parallelism leads to a high hardware utilization. through a series of MR rank-one updates by using the matrix inversion lemma. The iteration is initialized by setting A. Moderately Parallel Architecture The high-level block diagram of the proposed moderately 1 I (4) parallel architecture is shown in Fig. 2. The circuit employs p(O) MTG2 MT identical processing elements (PEs) arranged in a circular array and a common 1/ Y-block that computes the additions in and proceeds by computing step 4) and the pseudo floating-point division in step 5). The connections in the array are local, meaning that only neighHH p(j-l)' (5) boring PEs are connected with each other. Each PE mainly p(i) =p(i-1) contains a complex-valued multiplier, an adder and some local 1 + HHP(j-1)HH'i V storage registers as shown in Fig. 3. All intermediate variables where H1 denotes the jth row of H. After MR iterations, are stored locally, equally distributed over the PBs. For the

MT.2I

M

HI iH.Pi

p(MR)~~~~~~~~~~

HH+MGI (R n H

hr

h

. index of the OFDM tone has been omitted for brevity. The complexity of the above described algorithm in terms of the

21n

terms of complex-valued multiplications. The few real-valued mul-

tiplications are counted as complex-valued, assuming a dedicated VLSI architecture with multipliers optimized for complex-valued coefficients.

4103 Authorized licensed use limited to: Texas A M University. Downloaded on March 24, 2009 at 03:08 from IEEE Xplore. Restrictions apply.

Fr s r ' X S r fl X r wr wr tm ~ ~ ~ ~ ~ Cycles PE(1) PH'l

27 t | zt

z t|

4P4,2"j 2

P3,3

P2,4

[PHj]I

PE(3) _31 jl+22j2+l3j3

Algorithm and VLSI Architecture for Linear MMSE Detection in MIMO ...

Algorithm and VLSI Architecture for Linear MMSE Detection in MIMO ...

Suggest Documents

Algorithm and VLSI Architecture for Linear MMSE Detection in MIMO ...

MIMO Detection in Analog VLSI - ePrints Soton

An Efficient VLSI Architecture for CORDIC Algorithm

ASIP-Based Flexible MMSE-IC Linear Equalizer for MIMO Turbo ...

VLSI Preprocessing techniques for MUD and MIMO Sphere Detection

VLSI Preprocessing techniques for MUD and MIMO Sphere Detection

High Throughput VLSI Architecture for Soft-Output MIMO ... - CiteSeerX

Linear MMSE MIMO Channel Estimation with Imperfect ... - CiteSeerX

Algorithm and vlsi architecture for high performance adaptive video

Algorithm and vlsi architecture for high performance adaptive video ...

Robust MMSE Precoding for Multiuser MIMO

On the Limitation of Linear MMSE Detection and (Generalized) Welch ...

Linear Programming Detection and Decoding for MIMO Systems

Linear Programming Detection and Decoding for MIMO Systems

An Efficient VLSI Architecture for CORDIC Algorithm - Semantic Scholar

Algorithm-based low-power VLSI architecture for 2 ... - Semantic Scholar

Performance Analysis of ZF and MMSE Equalizers for MIMO Systems ...

A bioinspired collision detection algorithm for VLSI ... - IMSE-CNM

VLSI Implementation of MIMO Detection Using the Sphere ... - CiteSeerX

K-Best MIMO Detection VLSI Architectures ... - Semantic Scholar

Modified Lanczos Algorithm in Model Order Reduction of MIMO Linear

Linear Large-Scale MIMO Data Detection for 5G Multi-Carrier

Widely-Linear MMSE Receivers for Linear Dispersion Space ... - TSpace

A Low Complexity V-BLAST/STBC Detection Algorithm for MIMO ...