Optimized Training Sequences for Spatially Correlated MIMO-OFDM

2768

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 9, NO. 9, SEPTEMBER 2010

Optimized Training Sequences for Spatially Correlated MIMO-OFDM Hoang D. Tuan, Member, IEEE, Ha H. Kha, Ha H. Nguyen, Senior Member, IEEE, and V.-J. Luong

Abstract—In this paper, the training sequence design for multiple-input multiple-output (MIMO) orthogonal frequencydivision multiplexing (OFDM) systems under the minimum mean square error (MMSE) criterion is addressed. The optimal training sequence for channel estimation in spatially correlated MIMO-OFDM systems was not known for an arbitrary signalto-noise ratio (SNR). Only one class of training sequences was proposed in the literature in which the power allocation is given only for the extreme conditions of low and high SNRs. The current paper presents a necessary and sufficient condition for the optimal training sequence, and reformulates the training design problem as a convex optimization problem whose optimal solution is efficiently solved. In addition, tight upper bounds for MMSE and resulting low complexity iterative algorithms with the closed-form expression in iterations to find the optimum training sequence are derived. Simulation results confirm the superiority of the proposed design over the existing one in terms of both MSE estimation and BER performance. The proposed methods are also shown to be robust with respect to the spatial correlation mismatch at the transmitter. Index Terms—MIMO-OFDM, MMSE channel estimation, training sequence, spatially correlated fading, convex optimization.

I

number of channel parameters to be estimated are proportional to the channel delay spread and the number of antennas. Another reason is that each received signal depends on multiple channel parameters [10], [28]. There are several methods for MIMO-OFDM channel estimation reported in [5], [10], [14], [26]. Most of these methods have not exploited nor taken into account the spatial correlations in the design of optimal training sequences. However, in certain MIMO systems where the transmit antennas and/or the receive antennas cannot be placed sufficiently far apart due to the physical constraints, the MIMO channels are spatially correlated. In such systems, developing an accurate channel estimation method becomes a more challenging task. The superimposed training design for spatially correlated channels was presented in [21]. In this paper, we focus on the optimal design of time-division multiplexed (TDM) training sequences for channel estimation. To the best of our knowledge, only the work in [28] considers the TDM training sequence design for spatially correlated MIMO-OFDM channel estimation in the following cases: ∙

I. I NTRODUCTION

N COHERENT MIMO-OFDM systems, channel state information is required for data detection at the receiver. In practical wireless communications, channel parameters can be estimated by sending training sequences to the receiver. In general, the accuracy of channel estimation, which directly affects the overall system performance, highly relies on the design of training sequences. The training sequence designs for narrowband frequency-flat fading MIMO systems have been extensively studied, see, for example [7], [9], [11], [17] and references therein. In this paper, we consider the broadband MIMO channels together with the use of OFDM modulation. Although OFDM techniques can transform the broadband frequency-selective fading channels into a set of parallel flat fading sub-channels, the channel estimation of MIMO-OFDM systems is more challenging as compared to that of narrowband MIMO systems. One reason is that the Manuscript received August 20, 2009; revised November 24, 2009; accepted April 20, 2010. The associate editor coordinating the review of this paper and approving it for publication was D. Dardari. H. D. Tuan, H. H. Kha, and V.-J. Luong are with the School of Electrical Engineering and Telecommunications, University of New South Wales, Sydney, Australia (e-mail: {h.d.tuan, h.k.ha, d.v.luong}@unsw.edu.au). H. H. Nguyen is with the Department of Electrical and Computer Engineering, University of Saskatchewan, Saskatoon, Canada (e-mail: [email protected]). The preliminary result of this paper has appeared in [22]. This work was partially supported by Australia Research Council (ARC) Discovery Project 0772548. Digital Object Identifier 10.1109/TWC.2010.070710.081601

∙

There is one dominant tap over all the other taps. The training sequences are designed according to the correlation of the dominant tap only. As a consequence, the frequency-selective fading MIMO channel is basically treated as a flat fading channel. All the transmit correlation matrices are scaled versions of a fixed matrix (see (20) for more details).

Furthermore, the above special cases are analyzed at the extreme scenarios of low and high signal-to-noise ratios. The total power is entirely assigned to the best eigen-mode of the transmit correlation at the low SNR, while equal powers are allocated to all eigen-modes at the the high SNR (equi-power allocation). In this paper we focus on the optimal design of training sequences for MIMO-OFDM systems with the general case of spatial correlations. To formulate the optimization problem of the training sequence design, we adopt the minimum mean square error (MMSE) criterion subject to the power constraint. In contrast to the results in [28] obtained at extreme cases, our proposed methods result in the optimum training sequence for arbitrary spatial correlation matrices and at any SNR values. Specifically, the main contributions of this paper are as follows. First, a necessary and sufficient condition for the optimal training sequence is obtained. The conventional MMSE formulation for the design of training sequences in MIMO-OFDM is not easily solved because of the Kronecker product structure. However, we show that the possible set of optimal training sequences can be parameterized by the product of a fixed matrix with a new matrix variable. As

c 2010 IEEE 1536-1276/10$25.00 ⃝

TUAN et al.: OPTIMIZED TRAINING SEQUENCES FOR SPATIALLY CORRELATED MIMO-OFDM

a result, we can recast the optimization problem of the training sequence design as a convex programming, which is solvable by existing softwares using interior point methods of polynomial complexity. Second, we introduce tight upper bounds for the objective function in the convex optimization such that the approximate problems can be efficiently solved by low-complexity iterative algorithms. Inspired by our recent results in [15], [23] for flat fading MIMO systems, efficient procedures for finding the optimal training sequences are developed. Our analysis and results are verified by computer simulations. The rest of the paper is organized as follows. Section II presents the mathematical model of channel estimation for spatially correlated MIMO-OFDM systems and discusses the challenging design issue. Formulation of the training sequence design based on convex programming is carried out in Section III. Section IV describes tractable optimization algorithms to optimize the training sequences for general correlation matrices while the efficient algorithms for the special spatial correlations are presented in Section V. The issue of computational robustness is treated in Section VI. Simulation results are provided in Section VII and conclusions are drawn in Section VIII. Notations: Bold capital and lower case letters denote matrices and column vectors, respectively. (⋅)𝑇 and (⋅)𝐻 denote transpose and Hermitian transpose operations, respectively. The symbol ⊗ is used for the Kronecker product of two matrices and vec(X) denotes the vectorization operation of matrix X. X ≥ 0 (X > 0, resp.) means X is Hermitian symmetric and positive semi-definite (positive definite, resp.). I𝑛 is the identity matrix of dimension 𝑛 × 𝑛. The expectation operation is E{⋅}, while 𝒞𝒩 (0, 𝜎 2 ) denotes a circularly symmetric complex Gaussian random variable. Furthermore, [A𝑖𝑗 ]𝑖,𝑗=0,1,...,𝑁 with matrices A𝑖𝑗 constructs a matrix with block entries A𝑖𝑗 . Analogously, diag[A𝑖 ]𝑖=0,1,...,𝑁 constructs a matrix with diagonal blocks A𝑖 and zero off-diagonal blocks. The following properties are frequently used in the paper: (A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD); (A ⊗ B)𝐻 = 𝐻 A𝐻 ⊗ ] B ; [ A1[ A2 ] ⊗]B = [ A1 ⊗ B A2 ⊗ B ] and [ A1 ⊗ B A1 ⊗B = ; (A ⊗ B)−1 = A−1 ⊗ B−1 ; A2 ⊗ B A2 tr(A ⊗ B) = tr(A)tr(B) where tr(⋅) denotes the trace of a matrix; vec(AXB) = (B𝑇 ⊗ A)vec(X); A ⊗ B > 0 for A > 0 and B > 0; A1 ⊗ B > A2 ⊗ B for A1 > A2 > 0 and B > 0. II. MIMO-OFDM S YSTEMS AND C HANNEL E STIMATION Consider a frequency-selective fading MIMO communication channel described by the following transfer matrix: H(𝑧) =

𝐿−1 ∑

Hℓ 𝑧 −ℓ ,

ℓ=0

where each matrix Hℓ ∈ ℂ𝑀𝑟 ×𝑀𝑡 represents the gains of the ℓth MIMO path. The elements (Hℓ )𝑚,𝑛 of Hℓ are (possibly correlated) circularly symmetric complex Gaussian random variables that remain unchanged over the period of channel estimation. Assuming that the spatial correlations of MIMO channels can be modelled by a Kronecker structure

2769

with separable transmit and receive antenna correlations, the matrix-valued channel taps can be written as [1]: 1/2

1/2

Hℓ = R𝑟ℓ H𝑤𝑙 (R𝑡ℓ )𝑇 , ℓ = 0, 1, . . . , 𝐿 − 1

(1)

where H𝑤ℓ ∈ ℂ𝑀𝑟 ×𝑀𝑡 is uncorrelated matrices with independent and identical distributed 𝒞𝒩 (0, 1) entries. R𝑡ℓ = 1/2 1/2 1/2 1/2 R𝑡ℓ R𝑡ℓ and R𝑟ℓ = R𝑟ℓ R𝑟ℓ are the deterministic symmetric transmit and receive correlation matrices, respectively. For the (uncoded) MIMO-OFDM system with 𝑁 subcarriers, each of 𝑀𝑡 data sequences is divided into blocks of 𝑁 symbols. Each block goes through an OFDM modulator to form an OFDM block and is transmitted via one transmit antenna. The OFDM cyclic prefix length is chosen to be no smaller than the channel order 𝐿 − 1 to avoid the inter-block interference (IBI). By defining 𝑊𝑁 = e−𝚥2𝜋/𝑁 , the channel transfer function corresponding to the 𝑘th sub-channel is H𝑓 (𝑘) = H(𝑊𝑁𝑘 ) =

𝐿−1 ∑

Hℓ 𝑊𝑁ℓ𝑘 , 𝑘 = 0, 1, . . . , 𝑁 − 1. (2)

ℓ=0

The normalized input-output equation for each sub-carrier is √ 𝜌 r(𝑘) = H𝑓 (𝑘)s(𝑘) + n(𝑘), 𝑘 = 0, 1, . . . , 𝑁 − 1, (3) 𝑀𝑡 where r(𝑘) = (𝑟0 (𝑘), 𝑟1 (𝑘), . . . , 𝑟𝑀𝑟 −1 (𝑘))𝑇 ∈ ℂ𝑀𝑟 is the 𝑘th received signal vector, s(𝑘) = (𝑠0 (𝑘), 𝑠1 (𝑘), . . . , 𝑠𝑀𝑡 −1 (𝑘))𝑇 ∈ ℂ𝑀𝑡 is the transmitted signal vector, and n(𝑘) = (𝑛0 (𝑘), 𝑛1 (𝑘), . . . , 𝑛𝑀𝑟 −1 (𝑘))𝑇 represents additive white Gaussian noise (AWGN) whose elements are i.i.d 𝒞𝒩 (0, 1) random variables. In channel estimation, the channel tap matrices Hℓ , ℓ = 0, 1, . . . , 𝐿 − 1 or, equivalently, the vector h = (vec(H0 )𝑇 , vec(H1 )𝑇 , . . . , vec(H𝐿−1 )𝑇 )𝑇 , are estimated at the receiver based on the received signals r(𝑘), 𝑘 = 0, 1, . . . , 𝑁 − 1 and known transmitted signals s(𝑘), 𝑘 = 0, 1, . . . , 𝑁 − 1, which are called the training sequences. The training design problem is to find s(𝑘), 𝑘 = 0, 1, . . . , 𝑁 − 1 or, equivalently, S = [ s(0) s(1) . . . s(𝑁 − 1) ]𝑇 ∈ ℂ𝑁 ×𝑀𝑡 to optimize some estimation criterion, such as the least meansquare error (LMSE) or minimum mean-square error (MMSE), under the following normalized power constraint: tr{S𝐻 S} =

𝑁 −1 ∑

∣∣s(𝑘)∣∣2 = 𝑁 𝑀𝑡 .

(4)

𝑘=1

This paper adopts the MMSE criterion and the optimization problem is formulated in more detail next. First, rewrite Equation (3) as √ 𝜌 r(𝑘) = M(s(𝑘))h + n(𝑘), 𝑘 = 0, 1, . . . , 𝑁 − 1 𝑀𝑡 where

[ ] 𝑘⋅(𝐿−1) 𝑇 M(s(𝑘)) = 𝑊𝑁𝑘⋅0 s𝑇 (𝑘) 𝑊𝑁𝑘⋅1 s𝑇 (𝑘) . . . 𝑊𝑁 s (𝑘) ⊗I𝑀𝑟 . Then one has

√ r=

𝜌 M(S)h + n, 𝑀𝑡

(5)

2770


where ⎛

⎞ ⎛ ⎞ r(0) n(0) ⎜ r(1) ⎟ ⎜ n(1) ⎟ 𝑁𝑀 𝑁𝑀 r=⎝ ⎠ ∈ ℂ 𝑟, n = ⎝ ⎠ ∈ ℂ 𝑟, ... ... r(𝑁 − 1) n(𝑁 − 1) ⎡ ⎤ M(s(0)) ⎢ M(s(1)) ⎥ M(S)=⎣ ⎦ (6) ... M(s(𝑁 − 1)) =[F0 S F1 S . . . F𝐿−1 S] ⊗ I𝑀𝑟 ∈ ℂ𝑁 𝑀𝑟 ×𝐿𝑀𝑡 𝑀𝑟

and Fℓ =

diag{𝑊𝑁𝑘𝑙 }𝑘=0,1,...,𝑁 −1 ,

𝐻

M (S)M(S) = [(S

𝐻

F𝐻 ℓ F𝑞 S)

ℓ = 0, 1, . . . , 𝐿 − 1,

⊗ I𝑀𝑟 ]ℓ,𝑞=0,1,...,𝐿−1 .

(7)

Based on the following singular value decompositions (SVDs): 𝐻 R𝑡ℓ = Uℓ Λℓ U𝐻 ℓ , R𝑟ℓ = Vℓ Υℓ Vℓ ,

the correlation matrix of the ⎛ R𝑡0 ⊗ R𝑟0 .. ⎝ Rh = . 0 = Uh Λh U𝐻 h, where

Uh Λh

(8)

channel vector h is ⎞ ... 0 .. .. ⎠ . . . . . R𝑡(𝐿−1) ⊗ R𝑟(𝐿−1)

= diag[Uℓ ⊗ Vℓ ]ℓ=0,1,...,𝐿−1 , = diag[Λℓ ⊗ Υℓ ]ℓ=0,1,...,𝐿−1 .

(9) (10)

On the other hand, the correlation matrix of noise n is given by Rn = I𝑁 𝑀𝑟 . The MMSE estimation of h is therefore given by [18] ( )−1 √ 𝜌 𝜌 ˆ= h Rh M𝐻 (S) I + M(S)Rh M𝐻 (S) r 𝑀𝑡 𝑀𝑡 (11) ˆ 2 ), can be and the corresponding MMSE, 𝒞e (S) = E(∣∣h − h∣∣ shown to be { 𝒞e (S) = tr Rh − 𝑀𝜌𝑡 Rh M𝐻 (S) } ( )−1 × I + 𝑀𝜌𝑡 M(S)Rh M𝐻 (S) M(S)Rh . (12) The optimal training sequence design is to find S ∈ ℂ𝑁 ×𝑀𝑡 that minimizes 𝒞e (S), subject to the power constraint (4): min

S∈ℂ𝑁 ×𝑀𝑡

𝒞𝑒 (S) : tr{S𝐻 S} = 𝑁 𝑀𝑡 .

(13)

Note that when Rh is nonsingular, Equation (11) is equivalent to ( )−1 √ 𝜌 𝜌 𝐻 ˆ= R−1 + M (S)M(S) M𝐻 (S) r. (14) h h 𝑀𝑡 𝑀𝑡 Furthermore, applying the matrix inverse formula [4] to Equation (12) yields {( )−1 } 𝜌 −1 𝐻 Rh + M (S)M(S) . (15) 𝒞𝑒 (S) = tr 𝑀𝑡 The formulas (14) and (15) are more often used in textbooks (see, e.g., [8]). However, when Rh is singular, it is not true that (14) and (15) are still valid by replacing matrix

inversion with pseudo-inversion, as done in [28]. In fact, such replacement results in an upper bound for 𝒞e (S) because the matrix inversion formula cannot be extended to pseudoinversion one [24]. In general, the optimization problem (13) is a very difficult one. Sections III to V show how convex programming can be used to find its optimal solution for the case of nonsingular Rh . The extension to the case of singular Rh is addressed in Section VI. III. C ONVEX P ROGRAMMING FOR O PTIMAL T RAINING S EQUENCES IN THE G ENERAL C ASE It is noted that the optimization problem in minimizing the mean square channel estimation error in (13) is not easy to solve due to the Kronecker product structures of correlated channel matrices, and the form of matrix M(S). In general, there is no closed-form solution to the optimization problem (13). In this section, we show that problem (13) can be equivalently transformed into a convex optimization problem whose optimal solution can be efficiently obtained. From this section to Section V, we assume that the correlation matrix Rh is nonsingular so the formula (15) is used. It follows from Equations (6) and (9) that: {( )−1 } 𝜌 −1 𝐻 Rh + 𝑀𝑡 M (S)M(S) tr {( (16) )−1 } 𝜌 −1 Λh + 𝑀𝑡 Q(S) = tr , where 𝐻 Q(S) = U𝐻 h M (S)M(S)Uh 𝐻 = [(Uℓ ⊗ Vℓ𝐻 )((S𝐻 F𝐻 ℓ F𝑞 S) ⊗ I𝑀𝑟 ) ×(U𝑞 ⊗ V𝑞 )]ℓ,𝑞=0,1,...,𝐿−1 𝐻 𝐻 𝐻 = [(U𝐻 ℓ S Fℓ F𝑞 SU𝑞 ) ⊗ (Vℓ V𝑞 )]ℓ,𝑞=0,1,...,𝐿−1 . (17) It was shown in [28] that necessary conditions for the optimality of a solution to problem (13) are 𝐻 U𝐻 ℓ S SUℓ

S

𝐻

F𝐻 ℓ F𝑚 S

is diagonal, ℓ = 0, 1, . . . , 𝐿 − 1,

(18)

=0

(19)

when 0 ≤ ℓ ∕= 𝑚 ≤ 𝐿 − 1.

Note that problem (13) always admits the optimal solution but generally Equations (18)-(19) form a system of nonlinear equations where the number of equations may excess the number of unknowns and thus its feasibility is not quite guaranteed. We will see that Equation (18) cannot be a necessary optimal condition for (13) for the case of having different Uℓ in Equation (8). Reference [28] focused on the particular case where all the transmit correlation matrices at different taps are equal to a fixed matrix scaled by different scalars, that is, R𝑡ℓ = 𝜎ℓ R, ℓ = 0, 1, . . . , 𝐿 − 1.

(20)

Using the singular value decomposition R = UΛU𝐻 , and comparing (20) with (8), we have Uℓ ≡ U for ℓ = 0, 1, . . . , 𝐿 − 1. Then the constraints in (18) can be easily fulfilled because the set of 𝐿 constraints reduces to one constraint. On the other hand, the set of all S satisfying (19) can be easily parameterized by ¯ X ¯ ∈ ℂ𝑀𝑡 ×𝑀𝑡 , S = QX,

(21)


with fixed matrix Q ∈ ℂ𝑁 ×𝑀𝑡 constructed from 𝑀𝑡 columns, q𝑖 ∈ ℂ𝑁 , 𝑖 = 1, . . . , 𝑀𝑡 , of a unitary matrix such that 𝐻 𝐻 Q𝐻 F𝐻 ℓ F𝑚 Q = Q F𝑚−ℓ Q = [q𝑖 F𝑚−ℓ q𝑗 ]𝑖,𝑗=1,2,...,𝑀𝑡 = 0𝑀𝑡 for 0 ≤ ℓ ∕= 𝑚 ≤ 𝐿 − 1, or, equivalently, q𝐻 𝑖 F𝑚−ℓ q𝑗 = 0 for 𝑖, 𝑗 = 1, 2, .., 𝑀𝑡 ; 0 ≤ ℓ ∕= 𝑚 ≤ 𝐿 − 1. (22) Usually 𝑁 ≥ 𝐿𝑀𝑡 , so the columns q𝑖 can be easily constructed as follows [5], [10]. Take√ q1 = (𝑞1 (1), 𝑞1 (2), . . . , 𝑞1 (𝑁 ))𝑇 with ∣𝑞1 (𝑖)∣ = 1/ 𝑁 , 𝑖 = 1, 2, . . . , 𝑁 , so that ∣∣q1 ∣∣ = 1. Then q𝑖 , 𝑖 = 2, . . . , 𝑀𝑡 are defined from q1 by 𝐾(𝑖−1)𝑘

q𝑖 (𝑘) = q1 (𝑘)𝑊𝑁

, 𝑘 = 1, 2, . . . , 𝑁.

2771

means that the following optimization problem gives a lower bound of problem (13): {( )−1 } 𝐿−1 ∑ 𝜌 𝐻 −1 −1 tr R𝑡ℓ ⊗ R𝑟ℓ + S S ⊗ I𝑀𝑟 ℒ★ = min 𝑀𝑡 S∈ℂ𝑁 ×𝑀𝑡 ℓ=0

s.t.

=

𝑁 ∑ 𝑘=1

=

1 𝑁

𝒞e (Sopt ) ≥ ℒ★ ,

{ =

𝑘=1

1 0

[𝐾(𝑖−𝑗)−(𝑚−ℓ)]𝑘

𝑊𝑁

for 𝐾(𝑖 − 𝑗) = (𝑚 − ℓ) for 𝐾(𝑖 − 𝑗) ∕= (𝑚 − ℓ).

where Sopt is the optimal solution of (13). As 𝑁 ≥ 𝐿𝑀𝑡 ≥ 𝑀𝑡 , it is possible to make the following variable change:

ℓ=0

tr{X} = 𝑁 𝑀𝑡 .

(23)

The optimal solution Sopt to problem (13) is obtained from the optimal solution Xopt to problem (23) by 1/2

Sopt = QXopt .

tr{X−1 } ≥

𝑖=1

tr{A−1 𝑖𝑖 }

(29)

Now, define S𝑆 = QXopt , where Xopt is the optimal solution of (23). Then S𝑆 is obviously feasible to (13) and 𝐻 S𝐻 𝑆 Fℓ F𝑞 S𝑆 = 0, ℓ ∕= 𝑞 by (22). Therefore 𝒞e (Sopt ) ≤ 𝒞e (S𝑆 ) 𝐿−1 ∑ 𝜌 −1 tr{(R−1 Xopt ⊗ I𝑀𝑟 )−1 } = 𝑡ℓ ⊗ R𝑟ℓ + 𝑀𝑡 ℓ=0 = 𝒟★ . (31) From (28), (30), (31) it follows that 𝒞e (Sopt ) = 𝒟★ which completes the proof of Theorem 1. One can see that the objective function in (23) is convex in X and, therefore, the optimization problem (23) is in fact a convex programming. An easier way to recognize the convexity of (23) is to use the Shur’s complement argument [4] to write it as a semi-definite program (SDP) by involving slack variables Yℓ ∈ ℂ(𝑀𝑡 𝑀𝑟 )×(𝑀𝑡 𝑀𝑟 ) , ℓ = 0, 1, . . . , 𝐿 − 1:

(24)

Proof: From an inverse matrix formula (see, e.g., [29, p. 14]) the following inequality holds true for a positive definite matrix X = [A𝑖𝑗 ]𝑖,𝑗=1,2,...,𝐿 with entries A𝑖𝑗 : 𝐿 ∑

0 ≤ X = S𝐻 S ∈ ℂ𝑀𝑡 ×𝑀𝑡 .

1/2

Since 𝑁 ≥ 𝐾𝑀𝑡 , it is obvious that 𝐾(𝑖 − 𝑗) ∕= (𝑚 − ℓ) for all 1 ≤ 𝑖, 𝑗 ≤ 𝑁, 1 ≤ 𝑚 ∕= ℓ ≤ 𝐿 − 1 and, hence, (22) is satisfied. Thus q𝑖 ’s are orthonormal and form columns of an unitary matrix. Our result to correct the conditions (18) and (19) is stated in the following theorem. Theorem 1: Given that 𝑁 ≥ 𝐿𝑀𝑡 , the optimization problem (13) in S ∈ ℂ𝑁 ×𝑀𝑡 is equivalent to the following optimization problem in X ∈ ℂ𝑀𝑡 ×𝑀𝑡 : {( )−1 } 𝐿−1 ∑ 𝜌 −1 −1 𝒟★= min tr R𝑡ℓ ⊗ R𝑟ℓ + X ⊗ I𝑀𝑟 𝑀𝑡 0≤X∈ℂ𝑀𝑡 ×𝑀𝑡 s.t.

(28)

This is because under such a variable S can [ change, ] be X1/2 recovered back from X by S = Q𝑁 with 0(𝑁 −𝑀𝑡 )×𝑀𝑡 any unitary matrix Q𝑁 of dimension 𝑁 × 𝑁 . In other words, one has ℒ★ = 𝒟★ . (30)

[𝐾(𝑖−𝑗)−(𝑚−ℓ)]𝑘

∣q1 (𝑘)∣2 𝑊𝑁

𝑁 ∑

(27)

That is,

Here 𝐾 is the maximum integer not exceeding 𝑁/𝑀𝑡 , so that 𝑁 ≥ 𝐾𝑀𝑡 . It can be easily checked that q𝐻 𝑖 F𝑚−ℓ q𝑗

tr{S𝐻 S} = 𝑁 𝑀𝑡 .

(25)

with the equality sign if and only if the off-diagonal entries A𝑖𝑗 , 𝑖 ∕= 𝑗 are zero, i.e., X = diag[A𝑖𝑖 ]𝑖=1,2,...,𝐿 . Therefore, the objective function 𝒞e (S) of (13) is lower bounded as follows: { )−1} 𝐿−1 ∑ ( 𝜌 𝐻 −1 tr (R𝑡ℓ ⊗ R𝑟ℓ ) + S S ⊗ I𝑀𝑟 , (26) 𝒞e (S) ≥ 𝑀𝑡 ℓ=0

where the structures (9) and (17) have been used to identify the diagonal entries of the matrix in (13). Together with the −1 identity (R𝑡ℓ ⊗ R𝑟ℓ )−1 = R−1 𝑡ℓ ⊗ R𝑟ℓ , the inequality in (26)

min

X, Yℓ

[

Yℓ R𝑡ℓ ⊗ R𝑟ℓ

𝐿−1 ∑

tr{Yℓ }

s.t. X ≥ 0,

(32)

ℓ=0

R𝑡ℓ ⊗ R𝑟ℓ R𝑡ℓ ⊗ R𝑟ℓ + 𝑀𝜌𝑡 (R𝑡ℓ XR𝑡ℓ ) ⊗ R2𝑟ℓ

] ≥ 0.

Thus, problem (32) can be in principle solved by any existing SDP software such as YALMIP with SDP solvers [12]. Note that the SDP problem (32) has 𝐿 matrix variables Yℓ of size 𝑀𝑡 𝑀𝑟 ×𝑀𝑡 𝑀𝑟 and one matrix X of dimension 𝑀𝑡 ×𝑀𝑡 and, hence, it can be efficiently solved using interior point methods with complexity cost at most 𝒪(𝐿3 𝑀𝑡6 𝑀𝑟6 + 𝑀𝑡6 ) arithmetic operations in each iteration [13]. Before closing this section, we shall show that the condition (18) in general cannot be a necessary optimality condition for (13) by considering the case of 𝐿 = 2, 𝑀𝑟 = 1, 𝑀𝑡 > 2 and U0 and U1 are different in SVD (8) and R𝑡0 and R𝑡1 are not diagonal. This means that U0 and U1 are not rotation

2772


matrices1 . Without loss of generality, set R𝑟1 = 1 and problem (23) is { } { } 𝜌 𝜌 −1 −1 −1 min tr (R−1 + X) + X) + tr (R 𝑡0 𝑡1 𝑀𝑡 𝑀𝑡 0≤X∈ℂ𝑀𝑡 ×𝑀𝑡 s.t.

tr{X} = 𝑁 𝑀𝑡 .

(33)

The condition (18) requires that both U𝐻 0 Xopt U0 and U𝐻 1 Xopt U1 must be diagonal for the optimal solution Xopt of (33). This can only be fulfilled for Xopt = 𝑁 I𝑀𝑡 . Obviously, such Xopt is not the optimal solution of (33). By obtaining a numerical solution of (33), it can be verified that its optimal solution does not satisfy (18). A similar convex programming problem without a simplified diagonal structure has been considered in [27, Theorem 1]. IV. S IMPLIFIED C ONVEX O PTIMIZATION WITH C LOSED L OOP S OLUTIONS There has recently been a great interest of applying convex programming in signal processing for communications systems due to its numerical efficiency [13]. However, lowcomplexity algorithms are always highly desirable in communications systems, particularly in channel estimation problems. This motivates us to approximately reformulate the optimization problem (23) to a tractable problem whose solution can be efficiently found. Firstly, it is easily verified that problem (23) can be simplified to { )−1} 𝐿−1 ∑ ( 𝜌 −1 −1 tr R𝑡ℓ ⊗ Υℓ + X ⊗ I𝑀𝑟 min 𝑀𝑡 0≤X∈ℂ𝑀𝑡 ×𝑀𝑡 ℓ=0 s.t. tr{X} = 𝑁 𝑀𝑡 . (34) We now develop some nontrivial bounds for the objective in (34). Lemma 1: Given a positive semi-definite matrix A, the function 𝑓 (X) = tr{(X + A)−1 } (the function 𝑔(X) = tr{(X−1 + A)−1 }, resp.) is convex (concave, resp.) in X > 0. Consequently, for A ≥ 0 and B > 0 of appropriate sizes, the function tr{(X ⊗ B + A)−1 } is convex while the function tr{(X−1 ⊗ B + A)−1 } is concave in X > 0. Proof: As the function 𝛾(X) = tr{X−1 } is strictly convex in X > 0 [4, p. 468], it is obvious that functions 𝑓 (X) = 𝛾(X + A) and 𝛾(A−1 XA−1 ) are also convex. First, consider the function 𝑔(X) when A is positive definite. Using the matrix inversion formula [4] one has 𝑔(X) = tr{A−1 } − tr{(A + A−1 XA−1 )−1 } = tr{A−1 } − 𝛾(A−1 XA−1 ), −1

−1

where 𝛾(A XA ) is convex in X > 0. Thus, 𝑔(X) is concave in X > 0. For A ≥ 0 it follows that A + 𝜖I > 0 ∀𝜖 > 0 and so the function 𝑔𝜖 (X) = tr{(X−1 + A + 𝜖I)−1 } is concave for all 𝜖 > 0. Then 𝑔(X) = lim𝜖→+0 𝑔𝜖 (X) is also concave. We are now ready to state the main result of this section. Theorem 2: Define Υmax = diag[𝛾𝑖,max ]𝑖=1,2,...,𝑀𝑟 , Υmin = diag[𝛾𝑖,min ]𝑖=1,2,...,𝑀𝑟 , where 1 A matrix is called rotation if it is unitary and each of its columns has only one nonzero component.

𝛾𝑖,max =

max

ℓ=0,1,...,𝐿−1

Υℓ (𝑖, 𝑖), 𝛾𝑖,min =

min

ℓ=0,1,...,𝐿−1

Υℓ (𝑖, 𝑖),

𝑖 = 1, 2, . . . , 𝑀𝑟 . Then, the following lower and upper bounds hold true for the objective function in (34): ⎧(( ) )−1 ⎫ ⎬ ⎨ 𝐿−1 ∑ 𝐿𝜌 𝐿2 tr ≤(35) ⊗ Υ−1 R−1 X ⊗ I𝑀𝑟 min + 𝑡ℓ ⎭ ⎩ 𝑀𝑡 ℓ=0 {( )−1 } 𝐿−1 ∑ 𝜌 −1 tr R−1 X ⊗ I𝑀𝑟 ≤ 𝑡ℓ ⊗ Υℓ + 𝑀𝑡 ℓ=0 ⎧⎛ ⎞−1 ⎫ )−1 (   ⎬ ⎨ 𝐿−1 ∑ 𝜌 ⎠ . (36) R𝑡ℓ ⊗Υ−1 + X ⊗ I tr ⎝ 𝑀𝑟 max   𝐿𝑀𝑡 ⎭ ⎩ ℓ=0 Proof: As the matrices Υℓ are diagonal, it is obvious that −1 −1 Υ−1 max ≤ Υℓ ≤ Υmin ,

which implies −1 −1 −1 −1 −1 R−1 𝑡ℓ ⊗ Υmax ≤ R𝑡ℓ ⊗ Υℓ ≤ R𝑡ℓ ⊗ Υmin

and the following inequalities: {( )−1 } 𝐿−1 ∑ 𝜌 −1 −1 tr R𝑡ℓ ⊗ Υmin + X ⊗ I𝑀𝑟 ≤ 𝑀𝑡 ℓ=0 {( )−1 } 𝐿−1 ∑ 𝜌 −1 −1 tr R𝑡ℓ ⊗ Υℓ + X ⊗ I𝑀𝑟 ≤ 𝑀𝑡 ℓ=0 { } ( )−1 𝐿−1 ∑ 𝜌 −1 −1 tr R𝑡ℓ ⊗ Υmax + X ⊗ I𝑀𝑟 . 𝑀𝑡 ℓ=0

The lower bound (35) is then obtained by applying the Jensen inequality [25] for Yℓ = R−1 𝑡ℓ to the convex function {( )−1 } 𝜌 −1 𝑔(Yℓ ) = tr Yℓ ⊗ Υmin + X ⊗ I𝑀𝑟 , 𝑀𝑡 while the upper bound (36) is obtained applying the Jensen inequality [25] for Yℓ = R𝑡ℓ to the concave function {( )−1 } 𝜌 −1 −1 𝑔(Yℓ ) = tr Yℓ + ℓ ⊗ Υmax + X ⊗ I𝑀𝑟 . 𝑀𝑡 By Theorem 2, an upper bound minimization for (34) is provided by the following problem: {( )−1} 𝜌 tr R−1 ⊗ Υ−1 + X ⊗ I min 𝑀𝑟 max 𝐿𝑀𝑡 (37) 0≤X∈ℂ𝑀𝑡 ×𝑀𝑡 s.t. tr{X} = 𝑁 𝑀𝑡 . with R

=

𝐿−1 ∑

𝐿−1 ∑

ℓ=0

ℓ=0

R𝑡ℓ . Applying a SVD

R𝑡ℓ

=

UΛU𝐻 , Λ = diag(𝜆1 , . . . , 𝜆𝑀𝑡 ) and making the variable ˜ = U𝐻 XU, problem (37) can be rewritten as change X {( )−1 } 𝜌 ˜ −1 −1 min tr Λ ⊗ Υmax + 𝐿𝑀𝑡 X ⊗ I𝑀𝑟 𝑀𝑡 ×𝑀𝑡 ˜ (38) 0≤X∈ℂ ˜ = 𝑁 𝑀𝑡 . s.t. tr{X} It follows from Equation (25) that for any positive definite ˜ one has matrix X, 𝜌 ˜ −1 } tr{(Λ−1 ⊗ Υ−1 max + 𝐿𝑀𝑡 X ⊗ I𝑀𝑟 ) 𝜌 −1 −1 ˜ + diag[X(𝑖, 𝑖)]𝑖=1,2,...,𝑀𝑡 ⊗ I𝑀𝑟 )−1 }. ≥ tr{(Λ ⊗Υ max

𝐿𝑀𝑡


The above immediately implies that the optimal solution to the optimization problem in (38) must be in the diag˜ = diag(𝑦1 , . . . , 𝑦𝑀 ). This means that onal form, i.e., X 𝑡 ˜ is the matrix optimization problem (38) in the variable X equivalent to the following vector optimization problem in y = (𝑦1 , 𝑦1 , . . . , 𝑦𝑀𝑡 )𝑇 ∈ ℝ𝑀𝑡 : 𝑀𝑡 ∑

min

𝑦𝑖 ≥0, 𝑖=1,2,...,𝑀𝑡

where 𝑓𝑖 (𝑦𝑖 ) =

𝑓𝑖 (𝑦𝑖 ) :

𝑖=1

𝑀𝑡 ∑

𝑦𝑖 = 𝑁 𝑀 𝑡 ,

(39)

𝑖=1

𝑀𝑟 ( ∑ −1 𝜆−1 𝑖 𝛾𝑗,max + 𝑗=1

𝜌 𝑦𝑖 𝐿𝑀𝑡

)−1

( )−1 𝜌𝜆𝑖 𝛾𝑗,max = 𝜆𝑖 𝛾𝑗,max 1 + 𝑦𝑖 . 𝐿𝑀𝑡 𝑗=1 𝑀𝑟 ∑

−1

(40)

−1

is concave in Using the fact that the function (𝑥 + 𝑎) scalar variable 𝑥 > 0, a tight upper bound for 𝑓𝑖 (𝑦𝑖 ) in (40) is ( )−1 ( )−1 𝜌 𝜌𝑎𝑖 + 𝑦 = 𝑎 𝑀 𝑦 , 𝑓𝑖 (𝑦𝑖 ) ≤ 𝑀𝑟 𝑎−1 1 + 𝑖 𝑖 𝑟 𝑖 𝑖 𝐿𝑀𝑡 𝐿𝑀𝑡 (41) 𝑀𝑟 ∑ 𝛾𝑗,max . As an approximated solution where 𝑎𝑖 = 𝑀1𝑟 𝜆𝑖 𝑗=1

to the optimal solution of (39) we take the solution of the following optimization problem, which minimizes an upper bound of the objective function in (39): min

𝑀𝑡 ∑

𝑦𝑖 ≥0, 𝑖=1,2,...,𝑀𝑡 𝑎−1 𝑖=1 𝑖

1 : 𝜌 + 𝐿𝑀 𝑦𝑖 𝑡

𝑀𝑡 ∑

𝑦𝑖 = 𝑁 𝑀𝑡 . (42)

𝑖=1

The optimal solution to problem (42) is well known and given as: 𝑦𝑖 =

𝑀𝑟 ∑ 𝐿𝑀𝑡 1 max{𝜇 − 𝑎−1 , 0}, 𝑎 = 𝜆 𝛾𝑗,max , (43) 𝑖 𝑖 𝑖 𝜌 𝑀𝑟 𝑗=1

where 𝜇 > 0 is chosen so that

𝑀𝑡 ∑

𝑦𝑖 = 𝑁 𝑀𝑡 . It is basically a

𝑖=1

water-filling solution based on the following measure of each eigen-mode 𝑖 (𝑖 = 1, 2, . . . , 𝑀𝑡 ): 𝑒𝑖 = 𝜆𝑖

𝑀𝑟 ∑

𝛾𝑗,max .

𝑗=1

It should be emphasized that the optimal solution (43) can be found by a typical water-filling algorithm in less than 𝑀𝑡 iterations [3]. It was shown in [16] that the computational complexity of the water-filling algorithm mainly depends on the computation of eigenvectors and eigenvalues of matrices, and consequently the water-filling algorithm (43) has a computational complexity of 𝒪(𝐿𝑀𝑡3 + 𝐿𝑀𝑟3 ). V. T RACTABLE O PTIMIZATION OF T RAINING S EQUENCE D ESIGN FOR A PARTICULAR C ASE OF C ORRELATION M ATRICES In previous sections, we have considered the general case of correlation matrices. In this section, we focus on the special

2773

class of spatially correlated MIMO channels and propose an efficient algorithm to obtain the optimized training sequence. Specifically, we consider the optimization problem in (23) for the following case of correlation matrices: R𝑡ℓ = UΛℓ U𝐻 , ℓ = 0, 1, . . . , 𝐿 − 1,

(44)

where the diagonal matrices Λℓ are different but the unitary matrix U is the same. Note that Equation (44) includes the case in Equation (20) which was presented in [28] as a special ˜ = case. Problem (23) can be rewritten in the new variable X 𝐻 U XU as follows: { )−1} 𝐿−1 ∑ ( 𝜌 ˜ −1 −1 X ⊗ I𝑀𝑟 min tr Λℓ ⊗Υℓ + 𝑀𝑡 ×𝑀𝑡 (45) ˜ 𝑀𝑡 0≤X∈ℂ ℓ=0 ˜ s.t. tr{X} = 𝑁 𝑀𝑡 . Similar to the derivations in Section IV, it can be shown that the optimal solution to (45) must be in the diagonal form ˜ = diag(𝑦1 , . . . , 𝑦𝑀 ). As such the optimization problem X 𝑡 in (45) can be equivalently rewritten in terms of the vector variable y = (𝑦1 , 𝑦2 , . . . , 𝑦𝑀𝑡 )𝑇 ∈ ℝ𝑀𝑡 as 𝑀𝑡 ∑

min

𝑦𝑖 ≥0, 𝑖=1,2,...,𝑀𝑡

where 𝑓𝑖 (𝑦𝑖 ) = =

𝑖=1

𝑀𝑟 ( 𝐿−1 ∑∑ ℓ=0 𝑗=1 𝑀𝑟 𝐿−1 ∑∑ ℓ=0

𝑓𝑖 (𝑦𝑖 ) :

−1 𝜆−1 ℓ𝑖 𝛾ℓ𝑗

𝑀𝑡 ∑

𝑦𝑖 = 𝑁 𝑀 𝑡 ,

(46)

𝑖=1

𝜌 + 𝑦𝑖 𝑀𝑡

)−1

( )−1 𝜌𝜆ℓ𝑖 𝛾ℓ𝑗 𝜆ℓ𝑖 𝛾ℓ𝑗 1 + 𝑦𝑖 𝑀𝑡 𝑗=1

(47)

= diag(𝜆ℓ1 , . . . , 𝜆ℓ𝑀𝑡 ), and Υℓ = with Λℓ diag(𝛾ℓ1 , . . . , 𝛾ℓ𝑀𝑟 ), ℓ = 0, 1, . . . , 𝐿 − 1. Asymptotic solutions of the above optimization problem were presented in [28] for low and high SNRs. Here, we derive the optimum solution to (47) for an arbitrary SNR. Note that each function 𝑓𝑖 (⋅) is convex and decreasing in 𝑦𝑖 and, hence, problem (46) is a separate convex program. Similar to our previous results in [15], [23], it can be shown that the following Karush-Kuhn-Tucker condition is the necessary and sufficient condition for the optimality: ∂𝑓𝑖 (𝑦𝑖 ) ∂𝑦𝑖

+ 𝜇 − 𝜇𝑖 = 0, 𝜇 > 0, 𝜇𝑖 ≥ 0, 𝜇𝑖 𝑦𝑖 = 0, 𝑖 = 0, 1, . . . , 𝑀𝑡

(48)

where 𝜇 and 𝜇𝑖 are Lagragian multipliers. This is equivalent to have 𝑦𝑖 = max{𝑦𝑖 (𝜇), 0}, (49) where 𝑦𝑖 (𝜇) is the solution of the nonlinear equation )−2 𝑀𝑟 ( 𝐿−1 ∑∑ ∂𝑓𝑖 (𝑦𝑖 ) 𝜌 −1 −1 +𝜇 = 0 ⇔ 𝑦𝑖 = 𝜇𝑀𝑡 /𝜌 𝜆ℓ𝑖 𝛾ℓ𝑗 + ∂𝑦𝑖 𝑀𝑡 ℓ=0 𝑗=1 (50) and 𝜇 is found such that 𝑦 defined by (48) satisfies the 𝑖 ∑𝑀𝑡 𝑦 = 𝑁 𝑀 . Because the functions in (50) constraint 𝑖 𝑡 𝑖=1 are monotonically decreasing in 𝑦𝑖 , the solutions of (50) can be quickly located by the golden search, while 𝑦𝑖 (𝜇) is also monotonic in 𝜇. All these nonlinear equations are readily solved by the iterative bisection procedure (IBP) with a few

2774


iterative “water filling” steps. We refer the reader to [15], [23] for more details. Furthermore, an even simpler approximation for the optimal value of the optimization problem (46)-(47) can be obtained as follows. Using the fact that the function (𝑥−1 + 𝑎)−1 is concave in scalar variable 𝑥 > 0, a tight upper bound for 𝑓𝑖 (𝑦𝑖 ) in (47) is ( )−1 ( )−1 𝜌 𝜌𝑎𝑖 + 𝑦 = 𝑎 𝐿𝑀 𝑦 , 𝑓𝑖 (𝑦𝑖 ) ≤ 𝐿𝑀𝑟 𝑎−1 1 + 𝑖 𝑖 𝑟 𝑖 𝑖 𝑀𝑡 𝑀𝑡 with 𝑎𝑖 :=

1 𝐿𝑀𝑟

𝑀𝑟 𝐿−1 ∑∑

𝜆ℓ𝑖 𝛾ℓ𝑗 .

(51)

ℓ=0 𝑗=0

Consequently, the approximate solution to the optimization problem in (46) can be found by minimizing the upper bound of the objective function in (46). That is min

𝑀𝑡 ∑

𝑦𝑖 ≥0, 𝑖=1,2,...,𝑀𝑡

𝑖=1

𝑀𝑡 ∑ 1 : 𝑦𝑖 = 𝑁 𝑀 𝑡 . 𝜌 𝑎−1 𝑖 + 𝑀𝑡 𝑦𝑖 𝑖=1

(52)

The optimal solution of (52) is also in the form of “water filling”: 𝑀𝑡 𝑦𝑖 = max{𝜇 − 𝑎−1 (53) 𝑖 , 0}, 𝜌 where 𝜇 > 0 is chosen so that

𝑀𝑡 ∑

𝑦𝑖 = 𝑁 𝑀𝑡 . Again, the

𝑖=1

optimal solution (53) is a water-filling solution based on the following measure of each eigen-mode: 𝑒𝑖 =

𝑀𝑟 𝐿−1 ∑∑

𝜆ℓ𝑖 𝛾ℓ𝑗 , 𝑖 = 1, 2, . . . , 𝑀𝑡 ,

(54)

ℓ=0 𝑗=0

and it can be found in less than 𝑀𝑡 iterations [3]. The above solution is in sharp contrast to the solution of [28] in which the total power 𝑁 𝑀𝑡 is allocated to the mode with the maximum measure 𝑀𝑟 𝐿−1 ∑∑ (𝜆ℓ𝑖 𝛾ℓ𝑗 )2 . (55) 𝑒˜𝑖 = ℓ=0 𝑗=0

As discussed in Section IV, the significant computational complexity of the water-filling algorithm is the computation of eigenvectors and eigenvalues and, therefore, the algorithm (53) and asymptotic algorithm in [28] have the same computational complexity of 𝒪(𝐿𝑀𝑡3 + 𝐿𝑀𝑟3 ). As confirmed by simulation results in Section VII, the solution in (53) offers a better MMSE performance as compared to the solution (55) . VI. ROBUSTNESS I MPROVEMENT OF WATER -F ILLING S OLUTIONS AND THE C ASE OF S INGULAR C ORRELATION M ATRICES One can speed up the iterative water-filling procedure and improve its numerical robustness based on the following observation. Looking at the objective (47) as well as the nonlinear equation (50) one ( can see that for ) either small 𝜆ℓ𝑖 or 𝜌𝜆 𝛾

−1

ℓ𝑖 ℓ𝑗 small 𝛾ℓ𝑗 , the term 𝜆ℓ𝑖 𝛾ℓ𝑗 1 + 𝑀 𝑦𝑖 is very small and 𝑡 slowly varies in 𝑦𝑖 and thus can be ignored without sacrificing

much the optimality in the computed solutions. In other words, for a threshold 𝜖 > 0, let = {ℓ ∈ {0, 1, . . . , 𝐿 − 1} : 𝜆ℓ𝑖 ≥ 𝜖}, = {𝑗 ∈ {1, 2, . . . , 𝑀𝑟 } : 𝛾ℓ𝑗 ≥ 𝜖}

𝐿𝑖 𝑀ℓ

(56)

with cardinalities ∣𝐿𝑖 ∣ and ∣𝑀ℓ ∣. Then instead of (47) we use the following approximation: )−1 ∑ ∑ ( 𝜌 −1 𝑓𝑖 (𝑦𝑖 ) ≈ 𝑓˜(𝑦𝑖 ) := 𝛾 + 𝑦 . (57) 𝜆−1 𝑖 ℓ𝑖 ℓ𝑗 𝑀𝑡 ℓ∈𝐿𝑖 𝑗∈𝑀ℓ

Accordingly the following nonlinear equation is iteratively solved in the iterative water filling [23], [15] instead of (50): )−2 ∑ ∑ ( 𝜌 −1 −1 𝑦𝑖 = 𝜇𝑀𝑡 /𝜌. (58) 𝜆ℓ𝑖 𝛾ℓ𝑗 + 𝑀𝑡 ℓ∈𝐿𝑖 𝑗∈𝑀ℓ

Furthermore, when 𝑎𝑖 is very small in (51), the function 𝑓𝑖 (𝑦𝑖 ) does not contribute much to the performance of the overall 𝑀𝑡 ∑ 𝑓𝑖 (𝑦𝑖 ) and thus it can be ignored (i.e., setting 𝑦𝑖 objective 𝑖=1

to zero). Therefore, with the definition 𝑀𝜖 = {𝑖 ∈ {1, 2, . . . , 𝑀𝑡 } : 𝑎𝑖 ≥ 𝜖}

(59)

the following optimization can be considered instead of (52): ∑ ∑ 1 min : 𝑦𝑖 = 𝑁 𝑀 𝑡 . (60) 𝜌 𝑦𝑖 ≥0, 𝑖∈𝑀𝜖 𝑎−1 𝑖 + 𝑀𝑡 𝑦𝑖 𝑖∈𝑀 𝑖∈𝑀 𝜖

𝜖

The optimal ∑solution is still given by (53) where 𝜇 > 0 is such that 𝑦𝑖 = 𝑁 𝑀 𝑡 . 𝑖∈𝑀𝜖

Now, we return back to the optimization problem (13) when the covariance matrix Rh is singular. Applying the right matrix inversion formulas yields the result shown at the top of the next page, which is still in a very complicated form. We can overcome this difficulty by performing perturbation analysis as follows. Define Rh (𝜖) for 𝜖 ≥ 0 by (9) with R𝑡ℓ and R𝑟ℓ substituted by R𝑡ℓ + 𝜖I and R𝑟ℓ + 𝜖I, respectively. Then the SVD of Rh is Rh = Uh Λ(𝜖)U𝐻 h with Uh still defined by (10) while Λ(𝜖) = diag[(Λℓ + 𝜖I) ⊗ (Υℓ + 𝜖I)]ℓ=0,1,...,𝐿−1 . Rewrite (13) in this case as lim {

min

𝜖→0+ S∈ℂ𝑁 ×𝑀𝑡

with

(𝜖) 𝒞e (S)

= tr

𝒞e(𝜖) (S) : tr{S𝐻 S} = 𝑁 𝑀𝑡 },

{(

R−1 h (𝜖)

+

(63)

)−1 } . For

𝜌 𝐻 𝑀𝑡 M (S)M(S)

each 𝜖 > 0, the optimization problem in (63) can be addressed by using any result of the previous sections. Then, by taking 𝜖 → 0+, one can see that (59)-(60) are actually well-posed and apply to both singular and nonsingular cases of the covariance matrix Rh . VII. S IMULATION R ESULTS This section presents simulation results to evaluate the performance of our proposed methods. We consider the MIMOOFDM systems equipped with 𝑀𝑡 = 4 transmit antennas and 𝑀𝑟 = 4 receive antennas. Assuming the use of an uniform


{

𝒞𝑒 (S)

= = = =

[

] } ( )−1 𝜌 𝜌 1/2 𝐻 1/2 1/2 tr R M (S) I + M(S)Rh M𝐻 (S) M(S)Rh Rh I− 𝑀𝑡 h 𝑀𝑡 { } ( )−1 𝜌 1/2 𝐻 1/2 1/2 1/2 tr Rh R M (S)M(S)Rh Rh I+ 𝑀𝑡 h {[ ] } 𝜌 𝜌 tr I − Rh M𝐻 (S)(I + M(S)Rh M𝐻 (S))−1 M(S) Rh 𝑀𝑡 𝑀𝑡 } {( )−1 𝜌 Rh M𝐻 (S)M(S) Rh tr I+ 𝑀𝑡 1/2 Rh

15

antenna array, the Kronecker correlation matrix model in [1] is adopted with the correlation matrices given by

[R𝑟ℓ ]𝑚,𝑛 =

(61)

(62)

SDP (32) Upper bound (43) Equi-power

10

¯

𝜎ℓ e−𝑗2𝜋∣𝑛−𝑚∣Δ𝑡 cos(𝜃𝑡ℓ ) 2 1 ¯ ×e− 2 (2𝜋∣𝑛−𝑚∣Δ𝑡 sin(𝜃𝑡ℓ )𝜎𝜃𝑡ℓ ) , ¯ 𝜎ℓ e−𝑗2𝜋∣𝑛−𝑚∣Δ𝑟 cos(𝜃𝑟ℓ ) − 12 (2𝜋∣𝑛−𝑚∣Δ𝑟 sin(𝜃¯𝑟ℓ )𝜎𝜃𝑟ℓ )2 ×e .

previous method

(64)

Here the relations of the correlation matrices to the physical system parameters are follows. ∙ Δ𝑡 = 𝑑𝑡 /𝜆 (Δ𝑟 = 𝑑𝑟 /𝜆, resp.) is the relative transmit (receive, resp.) antenna spacing while 𝑑𝑡 and 𝑑𝑟 are the absolute antenna spacings with wavelength 𝜆 = 𝑓𝑐𝑐 and carrier frequency 𝑓𝑐 . They are normalized to 1 in all simulations. ¯𝑡ℓ and 𝜃¯𝑟ℓ are the mean angle of departure from the ∙ 𝜃 transmit array and the mean angle of arrival at the receive array. 2 2 ∙ 𝜎𝜃 and 𝜎𝜃 are the cluster angle spreads perceived by 𝑡𝑙 𝑟𝑙 the transmitter and receiver, respectively, so the actual angle of transmit (arrival, resp.) is 𝜃𝑡ℓ = 𝜃¯𝑡ℓ + 𝜃ˆ𝑡𝑙 (𝜃𝑟ℓ = 𝜃¯𝑟ℓ + 𝜃ˆ𝑟𝑙 , resp.), where 𝜃ˆ𝑡𝑙 and 𝜃ˆ𝑟𝑙 are modeled as 𝒞𝒩 (0, 𝜎𝜃2𝑡ℓ ) and 𝒞𝒩 (0, 𝜎𝜃2𝑟ℓ ) random variables, respectively. They are set equal to some specific value, 𝜎𝜃 . 2 ∙ 𝜎ℓ reflects the ℓ-th path channel (power) gain. Note that a large antenna spacing and/or a large cluster angle spread results in low spatial fading correlation and vice versa. We consider the MIMO channels which have 𝐿 = 5 taps with tap gains (𝜎02 , 𝜎12 , 𝜎22 , 𝜎32 , 𝜎42 ) = (0.3, 0.2, 0.2, 0.15, 0.15), and OFDM modulation using 𝑁 = 32 subcarriers. It is further assumed that the cyclic prefix is longer enough to avoid IBI. Example 1: This example demonstrates the performance of our proposed SDP solution of (32) and shows the MMSE performance gap between the approximate problem in (43) and the SDP problem in (32). For the general case we choose the average angles of departure and arrival quite arbitrarily. In this example, the average angles of departure = (13∘ , 16∘ , 20∘ , 24∘ , 27∘ ) are (𝜃¯𝑡0 , 𝜃¯𝑡1 , 𝜃¯𝑡2 , 𝜃¯𝑡3 , 𝜃¯𝑡4 ) while the average angles of arrival are (𝜃¯𝑟0 , 𝜃¯𝑟1 , 𝜃¯𝑟2 , 𝜃¯𝑟3 , 𝜃¯𝑟4 ) = (290∘ , 300∘ , 315∘, 320∘ , 335∘ ). The MSEs of channel estimation which are computed by means of Monte Carlo simulations are plotted in Figure 1 for two cases of the angle spreads, namely 𝜎𝜃 = 5.7∘ and 𝜎𝜃 = 14.3∘ . Here, we refer to the solutions of problems (32) and (43) as SDP and upper bound ones, respectively. It is clear from Figure 1 that the channel estimation errors decrease

5 MSE (dB)

[R𝑡ℓ ]𝑚,𝑛 =

2775

V =14.3o T

0 -5

our proposed methods V =5.7o T

-10 -15 -20 -10

-5

0

5 SNR (dB)

10

15

20

Fig. 1. Performance of MMSE channel estimation by different methods: Solid lines are for 𝜎𝜃 = 14.3∘ and dash lines are for 𝜎𝜃 = 5.7∘ .

with the angle spread. It can be clearly observed that MMSE performances of our upper bound method and SDP method are almost the same. These results confirm the tightness of the upper bound optimization problem whose complexity is significantly lower. On the other hand, as compared to the equi-power method, our methods can consistently improve the MSE channel estimation performance for all regions of SNRs. Example 2: In this example, we consider the special case of the correlation matrices where the average angles of departure are the same. Specifically, we assume that 𝜃¯𝑡ℓ = 13∘ , ℓ = 0, 1, 2, 3, 4, and the angle spread is 𝜎𝜃 = 8.6∘ , while other parameters are the same as in Example 1. In this case, our proposed water-filling algorithm (53) presented in Section V, and the asymptotic MMSE method in [28] are applicable. Figure 2 demonstrates the MMSE performances of our proposed methods, asymptotic MMSE approach in [28], and equi-power method. It can be observed that all of our methods exhibit the same MMSE performance. It should be emphasized that the upper bound method in Section IV and water-filling approach in Section V have very low computational complexity because they are based on standard water-filling algorithms whose solutions are found in less than 𝑀𝑡 = 4 iterations. From Figure

2776


10

0

10

SDP (32) Upper bound (43) Water-filling (53) Asymptotic Equi-power

5 previous methods

SDP (32) Upper bound (43) Water-filling (53) Asymptotic Equi-power Ideal

-1

10

-2

-5

Average BER

MSE (dB)

0 our proposed methods

-10

10

-3

previous methods

10

-4

ideal

10

-5

our proposed methods

10 -15 -10 Fig. 2.

-5

0

5 SNR (dB)

10

15

20

Performance of MMSE channel estimation by different methods.

-6

10

-5

0

5 SNR (dB)

10

15

Fig. 3. Bit error rate versus SNR for various methods of training sequence designs.

2, we can see that our proposed methods outperform both the asymptotic MMSE method and equi-power method, especially for SNRs larger than 0 dB.

is given by

It is important to evaluate the system performance in terms of the bit error rate (BER) because BER is the ultimate performance index in communication systems. For this evaluation and comparison, we adopt the MIMO-OFDM system parameters given in Example 2. For the transmission of information data, the Gray-mapped quadrature phase shift keying (QPSK) constellation is used together with the ℋ4 space-time block code of the rate 3/4 [20]. The maximum likelihood detection is performed at the receiver. Figure 3 shows the BER performance of various methods. For comparison we also provide BER for the ideal case of perfect channel estimate. It can be seen that the asymptotic MMSE method in [28] results in almost the same BER as the equi-power method, except for low SNRs where the asymptotic MMSE method performs slightly better. On the other hand, our proposed methods outperform the equi-power method for any SNRs.

However, due to a certain mismatch, the actual correlation ˆ hℓ can be modeled by matrix R

Example 3: This simulation example investigates the impact of the spatial correlation mismatch on the MMSE performance. In the preceding examples, we have assumed that the perfect knowledge of spatial correlation matrices is exploited at the transmitter to design the training sequences. In practice, the correlation matrices might be obtained by performing estimation at the receiver and feeding back to the transmitter. Therefore, there can be a mismatch between the actual correlation matrices and the presumed ones due to the imperfection in the estimation and quality of the feedback channel. However, it should be noted that although the channel state information can change very fast, the spatial correlations vary much more slowly. Therefore, the knowledge of channel correlation matrices is quite reliable [6]. In other words, the channel statistic mismatches are typically small. From Equation (1), the presumed correlation matrix of the ℓth tap

Rhℓ = R𝑡ℓ ⊗ R𝑟ℓ .

ˆ hℓ = (R𝑡ℓ + 𝜖𝑡ℓ I) ⊗ (R𝑟ℓ + 𝜖𝑟ℓ I). R

(65)

(66)

This mismatch model is known as diagonal loading in which the variances of the artificial white noise 𝜖𝑡ℓ and 𝜖𝑟ℓ are considered [2], [19]. In the simulation, we adopt all the MIMO-OFDM parameters from Example 2. We normalize tr(R𝑡ℓ ) = 𝜎ℓ 𝑀𝑡 , tr(R𝑟ℓ ) = 𝜎ℓ 𝑀𝑟 , and take 𝜖𝑡ℓ = 𝜖𝑟ℓ = 𝜖. Figure 4 shows the MMSE performances of our water filling method (53) for a varying amount of mismatch 𝜖 at different SNRs. The plots of MMSE performance with no mismatch are also shown in Figure 4 for comparison purposes2. It can be seen that the spatial correlation mismatch causes a slight performance degradation in MMSE performance (less than 1 dB) at low SNRs, while the performance degradation is almost negligible for high SNRs. Therefore it can be concluded that our proposed methods for the optimal design of training sequences are robust to spatial correlation mismatch. Before closing this section, we summarize the comparison between our methods and previously existing methods in terms of MMSE and BER performance. In terms of MMSE performance, the asymptotic method [28] is better than the equi-power method while our proposed methods outperform the asymptotic method. Furthermore, the asymptotic and equipower methods offer almost the same BER performance whereas our proposed methods lead to a BER performance improvement. We also summarize the computational complexity and the applicability of these methods in Table I. 2 The mismatch happens if the transmitter exploits the presumed correlation matrices instead of the actual correlation matrices for the training sequence design. On the other hand, the no mismatch case is the scenario where the transmitter perfectly knows the actual correlation matrices.


2777

TABLE I C OMPUTATIONAL COMPLEXITY AND APPLICABILITY OF DIFFERENT TRAINING DESIGN METHODS Method Computational Complexity Applicability Previously proposed equi-power1 𝒪(𝐿𝑀𝑡3 + 𝐿𝑀𝑟3 ) arbitrary correlation matrices and any SNRs Previously proposed asymptotic 𝒪(𝐿𝑀𝑡3 + 𝐿𝑀𝑟3 ) restrictive correlation matrices and extreme SNRs Proposed SDP 𝒪(𝐿3 𝑀𝑡6 𝑀𝑟6 + 𝑀𝑡6 ) any correlation matrices and arbitrary SNRs Proposed upper bound 𝒪(𝐿𝑀𝑡3 + 𝐿𝑀𝑟3 ) any correlation matrices and arbitrary SNRs Proposed water filling 𝒪(𝐿𝑀𝑡3 + 𝐿𝑀𝑟3 ) restrictive correlation matrices and arbitrary SNRs 1 Note

that although the equi-power method does not optimize the power allocation, it needs to compute eigenvectors of correlation matrices for transmit directions.

8

R EFERENCES SNR=-5dB

6

Mismatch No mismatch

4 SNR=0dB

MSE (dB)

2 0 SNR=5dB

-2 -4

SNR=10dB

-6 -8 0

0.05

0.1 H

0.15

0.2

Fig. 4. MMSE versus the varying amount of correlation mismatch for different SNRs.

VIII. C ONCLUSIONS In this paper, we have considered the optimal training sequence design for spatially correlated MIMO-OFDM systems. As compared to the previously known methods which can address an approximation problem for particular correlation matrices at extreme SNRs, our method has cast the optimization problem of training sequence design into an SDP problem and, hence the optimal training sequence for arbitrary correlation matrices and any SNRs can be efficiently found by numerical algorithms. Furthermore, we have also derived approximate solutions that can be obtained with a much lower complexity and also perform equally well as the optimal solution. The simulation results confirm that our proposed methods outperform the asymptotic MMSE method and equi-power method in terms of both MMSE and BER performance. Finally, we have also considered the impact of channel spatial correlation mismatch at the transmitter in training sequence design. The simulation results reveal that the MSE performance of our methods is very robust to the spatial correlation mismatch. ACKNOWLEDGEMENT The authors would like to thank the anonymous reviewers for their helpful comments and suggestions, which greatly improved the presentation of this paper.

[1] H. B¨olcskei, “Principles of MIMO-OFDM wireless systems,” CRC Handbook on Signal Processing for Communications, M. Ibnkahla, ed., 2004. [2] C. Y. Chen and P. P. Vaidyanathan, “Quadratically constrained beamforming robust against direction-of-arrival mismatch,” IEEE Trans. Signal Process., vol. 55, no. 8, pp. 4139–4150, Aug. 2007. [3] T. M. Cover and J. A. Thomas, Elements of Information Theory. New York: Wiley, 1991. [4] R. A. Horn and C. R. Jonhson, Matrix Analysis. Cambridge University Press, 1985. [5] M. M. I. Barhumi and G. Leus, “Optimal training desgin for MIMO OFDM sytems in mobile wireless chanels,” IEEE Trans. Signal Process., vol. 51, no. 6, pp. 1615–1624, June 2003. [6] S. A. Jafar and A. Goldsmith, “Multiple-antenna capacity in correlated Rayleigh fading with channel covariance information,” IEEE Trans. Wireless Commun., vol. 4, no. 3, pp. 990–997, May 2005. [7] D. Katselis, E. Kofidis, and S. Theodoridis, “On training optimization for estimation of correlated MIMO channels in the presence of multiuser interference,” IEEE Trans. Signal Process., vol. 56, no. 10, pp. 4892– 4904, Oct. 2008. [8] S. M. Kay, Fundamenral of Statistical Signal Processinf: Estimation Thoery. Prentice-Hall, 1993. [9] J. H. Kotecha and A. M. Sayeed, “Transmit signal design for optimal estimation of correlated MIMO channels,” IEEE Trans. Signal Process., vol. 52, no. 2, pp. 546–557, Feb. 2004. [10] Y. G. Li, “Simplified channel estimation for OFDM systems with multiple transmit antennas,” IEEE Trans. Wireless Commun., vol. 1, no. 1, pp. 67–75, Aug. 2002. [11] Y. Liu, T. F. Wong, and W. W. Hager, “Traning signal design for estimation of correlated MIMO channels with colored inference,” IEEE Trans. Signal Process., vol. 55, no. 4, pp. 1486–1497, Apr. 2007. [12] J. Lofberg, “YALMIP: a toolbox for modeling and optimization in Matlab,” in Proc. 2004 IEEE Int. Interntional Symposium on Computer Aided Control System Design, Taipei, Taiwan, Sep. 2004, pp. 284–188. [13] Z.-Q. Luo and W. Yu, “An introduction to convex optimization for communications and signal processing,” IEEE J. Sel. Areas Commun., vol. 24, no. 8, pp. 1426–1438, Aug. 2006. [14] H. Miao and M. Juntti, “Optimal power allocation and design of pilot symbols for frequency-selective correlated MIMO channel estimation,” in Proc. International Symp. Personal, Indoor and Mobile Radio Communications, Sep. 2005, pp. 171–175. [15] V. Nguyen, H. D. Tuan, H. H. Nguyen, and N. N. Tran, “Optimal superimposed training design for spatially correlated fading MIMO channels,” IEEE Trans. Wireless Commun., vol. 7, no. 8, pp. 3206–3217, Aug. 2008. [16] D. P. Palomar and R. Fonollosa, “Practical algorithms for a family of waterfilling solutions,” IEEE Trans. Signal Process., vol. 53, no. 2, pp. 686–695, Feb. 2005. [17] J. Pang, J. Li, L. Zhao, and Z. Lu, “Optimal training sequences for MIMO channel estimation with spatial correlation,” in Proc. IEEE Vehicular Technology Conf., Sep. 2007, pp. 651–665. [18] C. Rao, Linear Statistical Inference and its Applications. John Wiley and Sons, 1973. [19] S. Shahbazpanahi, A. B. Gershman, Z. Q. Luo, and K. M. Wong, “Robust adaptive beamforming for general-rank signal models,” IEEE Trans. Signal Process., vol. 51, no. 9, pp. 2257–2269, Sep. 2003. [20] V. Tarokh, H. Jafarkhani, and A. R. Calderbank, “Space-time block coding for wireless communications: performance results,” IEEE J. Sel. Areas Commun., vol. 17, no. 3, pp. 451–460, Mar. 1999. [21] N. N. Tran, H. D. Tuan, and H. H. Nguyen, “Superimposed training designs for spatially correlated MIMO-OFDM systems,” IEEE Trans. Wireless Commun., vol. 9, no. 3, pp. 876–880, Mar. 2010.

2778


[22] H. D. Tuan, V. J. Luong, and H. H. Nguyen, “Optimization of training sequences for spatially correlated MIMO-OFDM,” in Proc. 2009 IEEE Conf. on Acoustics, Speech and Signal Processing (ICASSP), Taipei, Taiwan, Apr. 2009, pp. 2681–2684. [23] H. D. Tuan, H. H. Nguyen, N. N. Tran, and V. Nguyen, “An effective global optimization algorithm for wireless MIMO channel,” in Proc. IEEE CAMSAP 2007, Dec. 2007, pp. 225–228. [24] H. D. Tuan, D. H. Pham, B. Vo, and T. Q. Nguyen, “Entropy of general gaussian distributions and MIMO channel capacity maximizing precoder and decoder,” in Proc. 2007 IEEE Conf. on Acoustics, Speech and Signal Processing (ICASSP), Hawaii, USA, Apr. 2007, pp. 325–328. [25] H. Tuy, Convex Analysis and Global Optimization. Kluwer Academic, 1997. [26] J. Wang and K. Araki, “Pilot-symbol aided channel estimation in spatially correlated multiuser MIMO-OFDM channels,” in Proc. IEEE Vehicular Technology Conf., Sep. 2004, pp. 33–37. [27] W. Yu, W. Rhee, S. Boyd, and J. Cioffi, “Iterative water-filling for Gaussian vector multiple-access channels,” IEEE Trans. Inf. Theory, vol. 50, no. 1, pp. 145–152, Jan. 2004. [28] H. Zhang, Y. G. Li, A. Reid, and J. Terry, “Optimum training symbol design for MIMO OFDM in correlated fading channels,” IEEE Trans. Wireless Commun., vol. 5, no. 9, pp. 2343–2347, Sep. 2006. [29] K. Zhou and J. Doyle, Essentials of Robust Control. Prentice-Hall, 1998. Hoang Duong Tuan (M’94) was born in Hanoi, Vietnam, in 1964. He received the diploma and the Ph.D. degree in applied mathematics from Odessa State University, Ukraine, in 1987 and 1991, respectively. From 1991 to 1994, he was a Researcher with the Optimization and Systems Division, Vietnam National Center for Science and Technologies. He was an Assistant Professor in the Department of Electronic-Mechanical Engineering, Nagoya University, Japan, from 1994 to 1999 and an Associate Professor in the Department of Electrical and Computer Engineering, Toyota Technological Institute, Nagoya, from 1999 to 2003. Presently, he is an Associate Professor in the School of Electrical Engineering and Telecommunications, the University of New South Wales, Sydney, Australia. His research interests include theoretical developments and applications of optimization based methods in many areas of control, signal processing, communication, and bioinformatics.

Ha Hoang Kha was born in Dong Thap, Vietnam, in 1977. He received the B.Eng. and M.Eng. degrees in Electrical Engineering and Telecommunications from HoChiMinh City University of Technology, in 2000 and 2003, respectively, and the Ph.D. degree in Electrical Engineering and Telecommunications from the University of New South Wales, Sydney, Australia, in 2009. From 2000 to 2004, he was a research and teaching assistant with the Department of Electrical and Electronics Engineering, HoChiMinh City University of Technology. He is currently a visiting research fellow at the School of Electrical Engineering and Telecommunications, the University of New South Wales, Australia. His research interests are in digital signal processing and wireless communications, with a recent emphasis on convex optimization techniques in signal processing for wireless communications. Ha H. Nguyen (M’01-SM’05) received the B.Eng. degree from Hanoi University of Technology, Hanoi, Vietnam, in 1995, the M.Eng. degree from the Asian Institute of Technology, Bangkok, Thailand, in 1997, and the Ph.D. degree from the University of Manitoba, Winnipeg, MB, Canada, in 2001, all in electrical engineering. He joined the Department of Electrical Engineering, University of Saskatchewan, Saskatoon, SK, Canada, in 2001, where he is currently a full Professor. He holds adjunct appointments at the Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, MB, Canada, and TRLabs, Saskatoon. He was a Senior Visiting Fellow in the School of Electrical Engineering and Telecommunications, University of New South Wales, Sydney, Australia, during October 2007 to June 2008. His research interests include digital communications, spreadspectrum systems, and error-control coding. He is a coauthor (with E. Shwedyk) of A First Course in Digital Communications (Cambridge, U.K.: Cambridge University Press, 2009). Dr. Nguyen currently is an Associate Editor for the IEEE T RANSACTIONS ON W IRELESS C OMMUNICATIONS and the IEEE T RANSACTIONS ON V E HICULAR T ECHNOLOGY . He is a Registered Member of the Association of Professional Engineers and Geoscientists of Saskatchewan. V-J. Luong received the B.Eng. degree from Posts and Telecommunications Institute of Technology, Hanoi, Vietnam, in 2004. He has received the Master degree in Telecommunications from the University of New South Wales, Australia, in 2010. Mr. Luong currently works for Optus Ltd., NSW, Australia.