Channel estimation for MIMO systems using data ... - Semantic Scholar

Channel estimation for MIMO systems using data-dependent superimposed training Mounir Ghogho

Ananthram Swami

School of EEE University of Leeds LS29JT, United Kingdom [email protected]

AMSRD-ARL-CI-CN Army Research Laboratory Adelphi, MD, USA [email protected]

Abstract Multiple antenna systems have been shown to increase system capacity and provide resiliency to random fading. However, these attractive features require accurate channel estimation. This can be achieved by time-division multiplexing (TDM) an adequately large training sequence. An alternative to TDM training is the superimposed training (ST) scheme which trades-off power for bandwidth (or rate). However, the performance of STbased channel estimation is affected by the unknown embedded data which acts like input noise. Here, we propose a data-dependent superimposed training (DDST) technique where the training sequence consists of a known sequence and a data-dependent sequence, which is unknown to the receiver. The data-dependent sequence cancels the effects of the unknown data on channel estimation. We consider both spatial multiplexing and space-time coded systems. For the latter, we focus on the Alamouti scheme. The proposed method is compared to the TDM and ST schemes in terms of the mean square error of the channel estimates, and the bit error rate. The proposed method is shown to offer good trade-offs between bandwidth and performance.

1

Introduction

Space-time multiplexing of multi-antenna transmission over fading channels is an effective way of combating fading, and enhancing data rates [1, 2]. However, the good performance of multiinput multi-output (MIMO) systems requires more accurate demodulation than single-input single/multi-output (SISO/SIMO) systems. Therefore, accurate channel estimation is crucial. In most practical systems, channel estimation is carried out using pilot symbols that are known to the receiver. Often, these pilot symbols are time-division multiplexed (TDM) with the data, i.e., pilots and information-bearing symbols are transmitted on different time slots. TDM (with guard times in the frequency-selective case) ensures that channel estimation can be decoupled from symbol detection. Although accurate channel estimation can be obtained if the number of these pilots is large enough [3], this method wastes bandwidth. An alternative method is the superimposed training (ST) scheme where pilots are added to the data symbols [4, 5, 6]. This scheme saves valuable bandwidth at the expense of a reduction in the information signal-to-noise ratio (SNR) since some of the transmitted energy is allocated to the hidden pilots. ST schemes offer tradeoffs between loss of rate and simplicity of the receiver, channel estimation vs. tracking, and possibly improved power efficiency. However, channel estimation based on existing ST methods is affected by the embedded unknown data which acts like input noise. Here, we mitigate this effect by distorting the data at the transmitter, prior to adding the known ST sequence. The data distortion consists of nulling the temporal correlation between the data sequences and the known ST sequences. This operation fully cancels the effect of the unknown data on the performance of the channel estimator. As we will see later, this technique is equivalent to superimposing a known training sequence and a data-dependent sequence to the data. We therefore refer to the proposed method as a data-dependent superimposed training 70

(DDST) scheme. This method is compared to the TDM and ST schemes in terms of the mean square error (MSE) of the channel estimate, and bit error rate (BER). Here, we focus on MIMO flat-frequency channels. The case of single-antenna frequency-selective channels was addressed in [8]. The more general case of MIMO frequency-selective channels will be addressed in a follow-up paper. Although we mentioned only TDM schemes, the above remarks also apply to FDM schemes (e.g., in OFDM, the data and pilots may be on distinct sub-carriers). Notation: Superscripts ∗ and † denote Hermitian and pseudo-inverse operators. The L2 norm, trace, statistical expectation and the Kronecker product are denoted by k · k2 , Tr {·}, E {·} and ⊗. The (K × K) identity matrix is denoted by IK . The zero matrix is denoted by 0, and the (Q × Q) matrix of all ones by 1Q . ||X||F and X† denote the Frobenius norm and the pseudo-inverse of the matrix X. The average power of any (M × N ) matrix X is defined as © © ªª Px := (M N )−1 Tr {E {XX∗ }} = (M N )−1 Tr E ||X||2F .

2

Signal Model

Consider a single-carrier block transmission system operating over a K-transmit, M -receive MIMO frequency-flat channel. Let N denote the block length. We assume the channel to be time-invariant over a single block but could vary across the blocks. At the receiver, the signal model for each block can be expressed as X = HS + V

(1)

where the (K × N ) and (M × N ) matrices S and X are the transmitted and received matrix blocks, H is the (M × K) channel matrix to be estimated, whose (`, k)th element is H`,k , the channel between the kth transmitter and `th receiver, and V is an (M × N ) additive noise matrix whose entries are zero-mean, independent and identically distributed Gaussian variables with variance σ 2 . Let Ps denote the power of the transmitted signal. The above model includes both the spatial-multiplexing scenario (used to increase capacity) where S is spatially uncorrelated, and the space-time coding scenario (used to increase spatial diversity in the case of fading channels). It also includes a combination of both scenarios. We first study the general case of spatial multiplexing and then study the Alamouti scheme [9].

3

Channel Estimation

We consider the channel estimation issue for the TDM, ST, and proposed DDST schemes.

3.1

Time-Division Multiplexing

In a TDM scheme, some of the entries of S are known pilots. Since the channels are assumed flat and time-invariant during each block, the performance of the channel estimator is independent of the placement of pilots. The pilots are therefore placed at the beginning of each data block, i.e., S = [C W] where C and W are the K × Nt training and K × Nd data matrices with N = Nt + Nd . The entries of the data matrix are drawn from finite alphabets, such as PSK or QAM constellations. Here, both data and training have the same average power, i.e., Pc = Pw = Ps . Assuming that C has full row-rank, the LS estimate of the channel matrix is, ˆ = Xt C∗ (CC∗ )−1 = Xt C† H

(2)

where Xt is the received training matrix, i.e., the leading submatrix of X. The MSE of © ©(M ×Nt ) ªª the TDM-based channel estimate is given by M σ 2 Tr E (CC∗ )−1 , and is minimized, subject to a fixed training power Pc , when [3] CC∗ = Nt Ps IK . 71

(3)

The minimum MSE is given by

³ ´ KM σ 2 ˜ = mse H Nt Ps

(4)

and clearly increases with the number of unknowns KM , and decreases with the SNR Nt Ps /σ 2 . Note that under condition (3), the LS channel matrix estimate can be simplified to ˆ = H

1 Xt C∗ Nt Ps

(5)

which is merely the cross correlation between Xt and C, the received and transmitted training blocks.

3.2

Superimposed Training

In the conventional ST scheme, a known training matrix, C, is added to the data matrix, W, i.e., S = W + C. The total transmitted power is now split between the data and the training matrices, i.e., Ps = Pw + Pc . We assume that the data symbols, the entries of W, are zeromean, independent and identically (i.i.d) distributed, and independent of V. Note that the i.i.d. assumption is not required for the estimator to be consistent but is assumed here to simplify the performance analysis of the estimator. The received signal matrix is now given by X = HC + HW + V .

(6)

The linear least square (LLS) estimator of H is then obtained by treating HW as an extra (non-Gaussian, correlated) additive noise term, and is given as in eq. (2) after replacing Xt by X. Under the assumption of i.i.d. data symbols, it is straightforward to show that the MSE of the LLS estimator is given by (Pw ||H||2F + σ 2 M )||C−1 ||2F , and is minimized when C satisfies condition (3), CC∗ = N Pc IK . The resulting LLS estimator is the same as in eq. (5) after replacing Nt by N and Xt by X; its MSE is given by ³ ´ K ¡P Tr {H∗ H} + Lσ 2 ¢ w ˆ = . (7) mse H N Pc One can see that the above MSE is a function of the power of the received data symbols through Pw Tr {H∗ H}. Indeed, the unknown data act like an input noise. Therefore, the above MSE has a self-noise effect, i.e., estimation errors exist even in the absence of additive noise. To mitigate this effect, we next propose a new ST method.

3.3

Data-Dependent Superimposed Training

We propose to distort the data matrix at the transmitter prior to adding the known training ˜ denote the distorted data matrix. The transmitted block matrix is now sequence. Let W ˜ +C . S=W As in the previous subsection, we estimate the channel matrix using eq. (2). The channel ˜ = (HW ˜ + V)C† . Hence, the following condition is estimation error can be written as H required to ensure that the channel estimate is independent of the unknown data regardless of the channel matrix realization, ˜ ∗=0. WC (8) ˜ = W + E where E is To simplify symbol detection, we next focus on linear distortion, i.e., W a perturbation matrix which depends upon W. Condition (8) is now satisfied if EC∗ = −WC∗ := −R 72

(9)

where R is the (K × K) deterministic spatial cross correlation matrix between W and C. The matrix E should be designed so that the deterministic spatial cross correlation between W + E and C is identically zero. Notice that unlike C, E will vary across the blocks since it is datadependent. This method can be interpreted as a data-dependent scheme where the training matrix is the sum of a known matrix, C, and a data-dependent matrix, E, which is unknown to the receiver. There is an infinity of matrices E satisfying condition (9). However, since E will behave like a perturbation of the data matrix, its power should be minimized. The design problem can therefore be formulated as follows minimize kEk2 subject to EC∗ = −R .

(10)

The optimal solution is found to be E = −RC(CC∗ )−1 .

(11)

Since C should be designed such that CC∗ = (N Pc )I, the optimal solution for E becomes E=−

1 1 RC = − WC∗ C . N Pc N Pc

(12)

Under condition (8), the channel matrix estimate is unbiased and its MSE is obtained as ³ ´ KM σ 2 ˆ = mse H . N Pc

(13)

Note that the above MSE is independent of the unknown data. Further, the proposed DDST method will have the same performance as the TDM-based estimate when the total power allocated to training is the same, i.e., Nt Pc = . (14) Ps N Recall that, unlike the TDM scheme, the proposed method does not waste bandwidth.

4

Efficient Implementation

There is an infinite number of matrices C that satisfy the optimum condition in eq. (3). In this section, we design an optimum matrix C that makes the implementation of the proposed DDST method computationally attractive. Assume that N = KQ where Q is an integer, i.e., we choose the block length to be an integer multiple of the number of transmit antennas. We choose the training sequence at √ the kth transmit antenna to be a complex exponential with frequency k/K, i.e., c(k, n) = Pc exp(j2πkn/K), with k = 0, ..., K − 1 and n = 0, ..., N − 1. Thus, the superimposed pilot pattern has an OFDM structure with equi-spaced sub-carriers; each transmit antenna uses a distinct sub-carrier so that the pilot signals from the different antennas are orthogonal to one another. This facilitates channel estimation1 . The K × K training pattern is repeated Q times in the block. It is easy to verify that C satisfies the optimality condition in eq. (3), CC∗ = N Pc IK . Further, we have that C∗ C = N Pc J where J =

1 Q 1Q

⊗ IK . Therefore, the perturbation matrix E in eq. (12) turns out to be E = −WJ .

1

(15)

It suffices to choose a K × K training pattern, F, repeated Q times, where FF∗ = KPc IK . Other obvious choices are Walsh-Hadamard codes.

73

Note that if Q = 1, J = IK , and E = −W resulting in the pure TDM scheme. The entries of E are given by e(k, i + mK) = e(k, i) = −

Q−1 1 X w(k, i + mK), Q

i, k = 0, ..., K − 1 .

m=0

Thus, the data distortion now reduces to removing the cyclic mean from each transmitted sequence. This simplifies the computation of the proposed DDST scheme not only at the transmitter but also at the symbol detection stage, as we shall see next.

5

Symbol Detection with DDST

Once the channel matrix has been estimated, one can first remove the contribution of the ˆ known training matrix. This can be achieved by computing X − HC. However, this operation is affected by the channel estimation errors. Instead, we compute

which is equal to2

Z = X − XJ

(16)

˜ Z = HW(I − J) + V

(17)

˜ = V(I − J), is colored additive noise with power equal to σ where V ˜ 2 = (1 − 1/Q)σ 2 . If L ≥ K, i.e., more receivers than transmitters, the distorted data matrix can be estimated using a zero-forcing or MMSE equalizer U = GZ

(18)

³ ´−1 ˆ † for the zero-forcing equalizer and G = H ˆ ∗H ˆ +σ ˆ ∗ for the MMSE where G = H ˜2I H equalizer. This MMSE equalizer ignores the color of the noise v˜. We note that for space-time coded systems, the condition L ≥ K is not required, as we will see in the next section. Due to data distortion at the transmitter, U 6= W even in the absence of channel estimation error and noise. Indeed, in this ideal scenario, U = W(I − J). Since (I − J) is singular, W cannot be recovered linearly. However, using the fact that the data symbols are drawn from a finite alphabet and that WJ is small compared to W (when Q À 1), symbol detection can be accomplished by finding the matrix of constellation points W that minimizes the Euclidian distance between U and U(I − J). However, this joint symbol detection scheme is computationally cumbersome and will therefore not be considered here. Instead, we propose the following iterative symbol-by-symbol detection scheme. The symbol-by-symbol detection algorithm is initialized by treating WJ as an extra additive noise, and considering U in eq. (18) as a soft detector of W; the initial hard detector of W is given by ¯ (0) = bUc W where buc denotes the matrix of constellation points that is the closest to the matrix U. The detected symbols are used to estimate WJ to be used in the next iteration. The detected symbols at the ith iteration are given by ¯ (i) = bU + W ¯ (i−1) Jc W As we will see in section 7, most of the gain in symbol detection performance over existing ST methods is obtained in the very first iteration. 2

We have that XJ = (HW − HWJ + HC + V)J = HC + VJ, since J2 = J and CJ = C, thanks to the repeated training patterns.

74

SNR degradation Since W has uncorrelated entries, and E = −WJ, so does E. The average power of the perturbation E can be expressed as Pe := (LN )−1 Tr {E {EE∗ }} =

Pw Q

(19)

which is a decreasing function of Q, as expected. The penalty of superimposing training on the data is often expressed in terms of a SNR loss which is defined as Pc + Pe Pc 1 Pw γ := = + (20) Ps Ps Q Ps where Ps = Pc +Pe +Pw is the total transmitted power. From the above equation, we see that in addition to the SNR loss due to the known training, the proposed scheme incurs an additional loss due the data-dependent training. However, this loss is not significant when Q À 1. Since E and W have a negative correlation, it turns out that there is a slight decrease in the total transmitted power when DDST is used. It is easy to verify that the total transmitter power is 1 Ps = Pc + Pw (1 − ) . Q We can write the power in the data as Pw = Ps (1 − α)/(1 − 1/Q) where α = Pc /Ps is the fractional power devoted to training. Since E is unknown to the receiver, its impact can be modelled as an extra additive noise term affecting symbol detection. It is useful to define the information SNR as ISNR :=

1 Pw 1−α = SNR 1 1−α 2 Pe + σ 1 − Q 1 + SNR Q−1

(21)

where SNR = Ps /σ 2 is the total signal-to-noise ratio3 . At very high SNR, there is a degradation in ISNR unless longer block lengths (larger Q) are used. This analysis does not take into account the effect of errors in channel estimation. The above ISNR is smaller that that obtained in the case of conventional superimposed training (i.e., E = 0) which is equal to (1 − α)SNR. The difference between the two ISNR’s will however be small when Q >> 1. Note that Q is limited by the channel coherence time. Figure 1 shows the SNR degradation, the ratio ISNR/SNR, as a function of the number of training blocks, Q; the dashed lines are the low-SNR asymptotes; the solid lines correspond to a nominal SNR of 10 dB.

6

Alamouti Space-Time Coding

Here, we assume N to be even. In the Alamouti space-time coding scheme, K = 2, M = 1 and µ ¶ 1 s0 −s∗1 s2 −s∗3 · · sN −2 −s∗N −1 S= 2 s1 s∗0 s3 s∗2 · · sN −1 s∗N −2 where the sn ’s are the entries of an (N × 1) vector s. For the DDST schemes, s is given by s=w+c+e where w is the (N × 1) data vector, c is the known training vector and e a the data-dependent training vector, which is unknown to the receiver. Thus, W, C and E have the same structure 3

The power of the channel matrix is assumed unity, i.e., Tr {E {HH∗ }} = 1, without loss of generality.

75

as S given above. Let wn , cn and en denote the nth element of w, c and e. Condition (3) is satisfied if N −1 X

N/2−1 2

|cn | = N Pc and

i=0

X

[c2i c∗2i+1 − c∗2i+1 c2i ] = 0

i=0

Following Section 4, we chose the training sequence to be cn = The data-dependent sequence is found to be ( PN/2−1 − N2 i=0 w2i if n even en = PN/2−1 − N2 i=0 w2i+1 if n odd

√ Pc (−1)n , n = 0, ..., N − 1.

In matrix form the distorted data vector can be expressed as, w + e = (I − J)w where J = 2 N 1N/2 ⊗ I2 . Once the channel has been estimated, and the known training-related term removed from the received signal as in eq. (16), the distorted data vector is estimated as in [9]. Let u denote this estimate. Note that if no channel estimation error occurs and there is no additive noise, u = (I − J)w. Symbol detection is carried out using the following iterative technique. The algorithm is initialized by treating Jw as an extra additive noise, and considering u, the output of the Alamouti demodulator, as a soft detector of w; the initial hard detector of w is given by ¯ (0) = buc w The detected symbols are used to estimate Jw to be used in the next iteration. The detected symbols at the ith iteration are given by ¯ (i) = bu + Jw ¯ (i−1) c w

7

Simulation Results

Here, we illustrate the performance of the proposed method when applied to a two by two spatial multiplexing system, i.e., K = M = 2. The block length is N = 32. The data symbols are randomly and independently drawn from QPSK constellations. The channel coefficients are independent and Rayleigh. The channel matrix is randomly generated at each Monte-Carlo run. We compare the proposed method with the TDM-based method and the data-independent superimposed technique in terms of channel MSE and symbol error rate (SER). The SER of the three methods is also compared to the SER when the channel is exactly known and full power is assigned to the information data. Figure 2 shows the MSEs of the different channel estimates versus Pc . The total transmit power was normalized to one, i.e., Ps = 1. The block length was N = 32 and the training period for the TDM scheme was chosen to be Nt = N/8 = 4. As expected from the theory, the channel estimate based on the TDM and the proposed DDST scheme have the same performance when Pc /Ps = Nt /N . In the latter case, the SERs for the different methods versus the SNR are displayed in Figure 4. Figure 3 depicts the SERs versus Pc when SN R = 15. We see that the proposed DDST method always outperforms the conventional ST technique. Further, with adequate choice of Pc , the performance of the former is very close to that of the TDM method. Recall that the latter wastes bandwidth unlike the DDST approach. The SERs versus the block length N are given in Figure 5 when SN R = 15, Nt = 8 and Pc = 0.1. In this scenario, one can see that with a quite small block length (N ≥ 36), the proposed method provide almost the same SER performance as the TDM method. This implies that the DDST technique can be used in scenarios where the coherence time of the channel is quite short. However, for higher size constellations aor/and large number of transmit antennas, N will need to be relatively larger in order for the data-dependent perturbation matrix not to significantly affect the SER. 76

References [1] T. L. Marzetta and B. M. Hochwald, “Capacity of a mobile multiple-antenna communication link in Rayleigh flat fading,” IEEE Trans. Information Theory, 45 pp. 139-157, Jan. 1999. [2] D. Gesbert, M. Shafi, Da-Shan Shiu, P.J. Smith, and A. Naguib, “From theory to practice: an overview of MIMO space-time coded wireless systems”, IEEE Journal on Selected Areas in Communications, 21(3), 281-302, April 2003. [3] B. Hassibi and B.M. Hochwald, “How much training is needed in multiple-antenna wireless links?”, IEEE Trans. Information Theory, 49(4), 951-973, Apri 2003. [4] G.T. Zhou, M. Viberg, and T. McKelvey, “A first-order statistical method for channel estimation”, IEEE Signal Processing Lett., 10(3), March 2003. [5] J. Tugnait and W. Luo, “On channel estimation using superimposed training and first-order statistics”, IEEE Communications Lett., 7(9), pp. 413-6, Sept 2003. [6] A. G. Orozco-Lugo, M. M. Lara and D. C. McLernon, “Channel 1estimation using implicit training,” IEEE Trans. Signal Processing, 52(1), Jan. 2004. [7] L. Tong, B.M. Sadler, and M. Dong “Pilot assisted wireless transmissions”, to appear in IEEE Signal Processing Magazine, November, 2004. [8] M. Ghogho, D. C. McLernon, E. Alameda-Hernande, and A. Swami, “Channel estimation and symbol detection for block transmission using data-dependent superimposed training”, to appear in IEEE Signal Processing Lett., 2004. [9] S.M. Alamouti, “A simple transmit diversity technique for wireless communications”, IEEE Journal on Selected Areas in Communications, (16(8), 1451-58, Oct 1998.

2

1

0

ISNR/SNR in dB

−1

−2

−3

−4 0.0 0.1 0.2 0.3 0.4 0.5

−5

−6

−7

−8

0

5

10

15

20

25 Q

30

35

40

45

50

Figure 1: The INSR degradation ratio, ISNR/SNR vs the number of blocks Q; solid lines correspond to SNR of 10 dB; dashed lines are low-SNR asymptotes.

77

K=M=2; N=32, Nt=4; SNR=15dB; QPSK

MSE of channel estimates

TDM ST DDST

0

10

−1

10

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pc

Figure 2: MSE of channel estimates vs. the power fraction for training.

K=M=2; N=32, Nt=4; SNR=15dB; QPSK

0

10

SER

TDM ST DDST: iter 0 DDST: iter 1 DDST: iter 2 Known channel

−1

10

−2

10

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

P

c

Figure 3: Symbol error rate versus the power fraction for training

78

K=M=2; N=32; Nt=4; Pc=0.125; QPSK

0

10

−1

SER

10

−2

10


−3

10

−4

10

0

5

10

15

20

25

30

SNR

Figure 4: Symbol error rate versus SNR

K=M=2; Nt=8; Pc=0.1; SNR=15dB; QPSK

0

10

SER


−1

10

−2

10

0

20

40

60

80

100

120

N

Figure 5: Symbol error rate versus the block length

79

140