FDD Massive MIMO via UL/DL Channel Covariance Extrapolation and Active Channel
arXiv:1803.05754v1 [cs.IT] 15 Mar 2018
Sparsification Mahdi Barzegar Khalilsarai? , Saeid Haghighatshoar? , Xinping Yi† , and Giuseppe Caire? Abstract We propose a novel method for massive Multiple-Input Multiple-Output (massive MIMO) in Frequency Division Duplexing (FDD) systems. Due to the large frequency separation between Uplink (UL) and Downlink (DL), in FDD systems channel reciprocity does not hold. Hence, in order to provide DL channel state information to the Base Station (BS), closed-loop DL channel probing and feedback is needed. In massive MIMO this incurs typically a large training overhead. For example, in a typical configuration with M ' 200 BS antennas and fading coherence block of T ' 200 symbols, the resulting
rate penalty factor due to the DL training overhead, given by max{0, 1 − M/T }, is close to 0. To reduce
this overhead, we build upon the observation that the Angular Scattering Function (ASF) of the user
channels is invariant over the frequency domain. We develop a robust and stable method to estimate the users’ DL channel covariance matrices from pilots sent by the users in the UL. The resulting DL covariance information is used to optimize a sparsifying precoder, in order to limit the effective channel dimension of each user channel to be not larger than some desired DL pilot dimension Tdl . In this way, we can maximize the rank of the effective sparsified channel matrix subject to a desired training overhead penalty factor max{0, 1 − Tdl /T }. We pose this problem as a Mixed Integer Linear Program, that can
be efficiently solved. Furthermore, each user can simply feed back its Tdl pilot measurements. Thus, the proposed approach yields also a small feedback overhead and delay. We provide simulation results demonstrating the superiority of the proposed approach with respect to state-of-the-art “compressed DL pilot” schemes based on compressed sensing. Index Terms FDD massive MIMO, downlink covariance estimation, active channel sparsification.
?
Communications
and
Information
Theory
Group,
Technische
Universit¨at
Berlin
({m.barzegarkhalilsarai,
saeid.haghighatshoar, caire}@tu-berlin.de). † Department of Electrical Engineering and Electronics, University of Liverpool (
[email protected]).
1
I. I NTRODUCTION Multiuser Multiple-Input Multiple-Output (MIMO) consists of exploiting multiple antennas at the Base Station (BS) side, in order to multiplex over the spatial domain multiple data streams to multiple users sharing the same time-frequency transmission resource (channel bandwidth and time slots). It is wellknown that, for a block-fading channel with coherence block of T symbols,1 the high-SNR sum-capacity behaves as C(SNR) = M ∗ (1 − M ∗ /T ) log SNR + O(1), where M ∗ = min{M, K, T /2}, M denotes the number of BS antennas, and K denotes the number of single-antenna users (e.g., see [2, 3] and
references therein). When the number of BS antennas and users is potentially very large, the system multiplexing gain2 is maximized by serving K = T /2 data streams (users). While any number M ≥ K
of BS antennas yields the same (optimal) multiplexing gain, a key observation made in [4] is that, when training a very large number of antennas comes at no additional overhead cost, it is indeed convenient to use M K antennas at the BS. In this way, at the cost of some additional hardware complexity,
very significant benefits at the system level can be achieved. These include: i) energy efficiency (due to the large beamforming gain); ii) inter-cell interference reduction; iii) a dramatic simplification of user scheduling and rate adaptation, due to the inherent large-dimensional channel hardening [5]. Systems for which the number of BS antennas M is much larger than the number of DL data streams K are generally referred to as massive MIMO (see [4–6] and references therein). Massive MIMO has been the object of intense research investigation and development and is expected to be a cornerstone of the forthcoming 5th generation of wireless/cellular systems [7]. In order to achieve the benefits of massive MIMO, the BS must learn the downlink channel coefficients for K users and M K of BS antennas. For Time Division Duplexing (TDD) systems, due to
the inherent Uplink-Downlink (UL-DL) channel reciprocity [2], this can be obtained from K mutually orthogonal UL pilots transmitted by the users. Unfortunately, the UL-DL channel reciprocity does not hold for Frequency Division Duplexing (FDD) systems, since the UL and DL channels are separated in frequency by much more than the channel coherence bandwidth [1]. Hence, unlike TDD systems, in FDD the BS must actively probe the DL channel by sending a common DL pilot signal, and request the 1
This is the number of signal dimensions over which the fading channel coefficients can be considered constant over time
and frequency [1]. 2
The system multiplexing gain indicates the number of virtual parallel interference-free Gaussian channels that the system can
create. This is referred to also as sum Degrees of Freedom or “pre-log” factor, since it manifests itself as the factor multiplying log SNR in the high-SNR expression of C(SNR).
2
users to feed their channel state back. In order to obtain a “fresh” channel estimate for each coherence block, Tdl out of T symbols per coherence block must be dedicated to the DL common pilot. Assuming (for simplicity of exposition) a delay-free channel state feedback, the resulting DL multiplexing gain is given by K × max{0, 1 − Tdl /T }, where K is the number of DL data streams sent by spatial multiplexing, and max{0, 1 − Tdl /T }
is the penalty factor incurred by DL channel training. Conventional DL training consists of sending orthogonal pilot signals from each BS antenna. Thus, in order to train M antennas, the minimum required training dimension is Tdl = M . Hence, with such scheme, the number of BS antennas M cannot be made arbitrarily large. For example, consider a typical case taken from the LTE system [8], where groups of users are scheduled over resource blocks spanning 14 OFDM symbols × 12 subcarriers, for
a total dimension of T = 168 symbols in the time-frequency plane. Consider a typical massive MIMO configuration serving K ∼ 20 users with M ≥ 200 antennas (e.g., see [9]). In this case, the entire
resource block dimension would be consumed by the DL pilot, leaving no room for data communication. Furthermore, feeding back the M -dimensional measurements (or estimated/quantized channel vectors) represents also a significant feedback overhead for the UL [10–14]. While the argument above is kept informal on purpose, for simplicity of exposition, it can be made information-theoretically rigorous. The central issue is that, if one insists to estimate the K × M channel matrix in an “agnostic” way, i.e., without exploiting its fine structure, a hard dimensionality bottleneck kicks-in and fundamentally limits the achievable multiplexing gain. It follows that gathering “massive MIMO gains” in FDD systems is a challenging problem. On the other hand, current wireless networks are mostly based on FDD. Such systems are easier to operate and more effective than TDD systems in situations with symmetric traffic and delay-sensitive applications [15–17]. In addition, converting current FDD systems to TDD would represent a non-trivial cost for wireless operators. With these motivations in mind, a significant effort has been recently devoted in order to reduce the common DL training dimension and feedback overhead in order to materialize significant massive MIMO gains also for FDD systems. A. Related works: compressed DL pilots Several works have proposed to reduce both the DL training and UL feedback overheads by exploiting the sparse structure of the massive MIMO channel. In particular, it has been observed that the propagation between the BS array and the user antenna occurs through a limited number of scattering clusters, with
3
limited support3 in the Angle-of-Arrival/Angle-of-Departure (AoA-AoD) domain.4 Hence, by decomposing the angle domain into discrete “virtual beam” directions, the M -dimensional user channel vectors admit a sparse representation in the beam-space domain (e.g., see [18, 19]). Building on this idea, a large number of works (e.g., see [17, 20–23]) proposed to use “compressed pilots”, i.e., a reduced DL pilot dimension Tdl < M , in order to estimate the channel vectors using Compressed Sensing (CS) techniques [24, 25]. In [22], the authors noticed that the angles of the multipath components of channel are independent of the subcarrier index, in an OFDM system. Hence, by probing the channel over multiple coherence blocks in the frequency domain, the common sparsity across these measurements can be exploited. This gives rise to a so-called Multiple Measurement Vector (MMV) setting, arising when multiple snapshot of a random vector with common sparse support can be acquired and jointly processed (e.g., see [26, 27]). Another perhaps more interesting approach is presented in [17], where joint sparsity across different users is exploited. This work starts with the observation that, as shown in many experimental studies [28– 31], the propagation between the BS antenna array and the users occurs along given scattering clusters, that may be common to multiple users, since they all belong to the same scattering environment. In turns, this yields that the channel sparse representations (in the angle/beam-space domain) share a common part of their support. In the scheme proposed in [17], the users feed back their noisy DL pilot measurements to the BS, and the latter runs a joint recovery algorithm, coined as Joint Orthogonal Matching Pursuit (J-OMP), able to take advantage of the common sparsity. It follows that in the presence of common sparsity, J-OMP improves upon the basic CS schemes that estimate each user channel separately. B. Contribution The focus of this paper is an efficient scheme for massive MIMO in FDD systems. Our goal is to reduce both the DL training overhead, i.e., the number Tdl of DL pilot dimensions that the BS transmits in each slot of T symbols to probe the DL channel, while preserving a large number K of users per slot that can be served using spatial multiplexing. Similar to [17], we consider a scheme where each user sends back its Tdl noisy pilot observations per slot, using unquantized analog feedback (see [10, 11]). Hence, achieving a small Tdl yields both a reduction of the DL training overhead and a reduction of the UL feedback overhead. However, differently from all the works mentioned before, we make no use of 3
Throughout the paper the term “support” indicates a set of intervals/indices over which a function/vector has non-zero value.
4
From the BS perspective, AoD for the DL and AoA for the UL indicate the same domain. Hence, we shall simply refer to
this as the “angle domain”, while the meaning of departure (DL) or arrival (UL) is clear from the context.
4
CS. As a matter of fact, CS is relevant when the positions of the non-zero elements in the target sparse vectors are not known. In contrast, if the support is known, classical Bayesian or ML estimation yields generally a better estimation performance. Hence, the key problem is how to guess precisely the support, i.e., position of the non-zero elements, of the DL user channels in the angle/beam-space representation. The angular support of the channel vector is encoded in its covariance matrix, which can be expressed as an integral transform of the channel Angular Scattering Function (ASF), describing the distribution of the signal power over the angle domain (see Section II). Over frequency intervals that are small with respect to the carrier frequency, the channel ASF is essentially invariant with respect to frequency. For example, consider the LTE-IMT band [32], where the UL takes place at [1920, 1980] MHz, and the DL takes place at [2110, 2170] MHz. The overall spanned frequency interval of 250 MHz is almost one order of magnitude smaller than the center frequency of 2045 MHz. In these conditions, the reciprocity of the ASF (i.e., same ASF for UL and DL) is known to hold (e.g., see [33–35]). The first problem addressed in this paper is how to estimate the DL channel covariance matrix from UL pilot symbols (see Section III). Notice that UL pilot symbols are sent anyway in order to enable a coherent multiuser MIMO reception in the UL (e.g., using linear Minimum MSE (MMSE) or the linear Zero-Forcing (ZF) detector [4, 6, 36]). Hence, the covariance estimation does not involve any additional training overhead with respect to what is done in any system based on coherent pilot-aided detection. Our approach consists of estimating the channel AFS of each user from UL pilots, and using it to “extrapolate” the covariance matrix from UL to DL. As shown in our recent work [37], this extrapolation problem is non-trivial and must be posed in a robust min-max sense. In fact, the same UL covariance matrix corresponds to many different possible DL covariance matrices, consistent with the UL antenna correlation. In [37] we show that robust covariance reconstruction can be obtained by suitably truncating the spatial correlation function outside the interval. In the present paper, we apply this approach together with a new convex optimization method to estimate the ASF from the UL sample covariance matrix obtained from UL pilots. The second problem addressed in this paper is how to effectively and artificially reduce each user channel dimension, such that a single common DL pilot of assigned dimension Tdl is sufficient to estimate a large number of user channels (see Section IV). In the CS-based works reviewed above, the pilot dimension depends on the channel sparsity level s (number of non-zero components in the angle/beam-space domain). In fact, standard CS theory states that stable sparse signal reconstruction is
5 Base Station
A mobile user inside Cluster: A group of MPCs
two visibility regions Visibility regions of clusters
Fig. 1: A sketch of the clusters and visibility regions in the COST2100 model. possible using Tdl = O(s log M ) measurements.5 In a rich scattering situation, s is large irrespectively of whether users have or not common sparsity as in [17]). Hence, these methods CS-based methods may or may not work well, depending on the propagation environment. In order to allow channel estimation with an assigned pilot dimension Tdl , we use the DL covariance information in order to design an optimal sparsifying precoder. This is a linear transformation that depends only on the channel second order statistics (DL covariances) that imposes that the effective channel matrix (including the precoder) has large rank and yet each column has sparsity not larger than Tdl . We cast the optimization of the sparsifying precoder as a Mixed Integer Linear Program (MILP), which can be efficiently solved using standard off-the-shelf solvers. II. S YSTEM S ETUP
A. Channel model We consider the COST 2100 channel model as the basic setup for modeling a propagation environment [38]. This model is a geometry-based stochastic channel model (GSCM) that describes the properties of the channel in time, frequency and space. The propagation model consists of clusters of multipath components (MPCs) and visibility regions as its building blocks. A cluster is a group of MPCs, generated by the reflection of the signal from the objects in the environment. Each cluster is associated to a visibility region, and users inside its visibility region are coupled with the BS array through the corresponding cluster. Since the visibility regions partially overlap, this create partially overlapping scattering for the users as in [17]. Fig. 1 shows the concept of the COST 2100 propagation model. 5
As commonly defined in the CS literature, we say that a reconstruction method is stable if the resulting MSE vanishes as
1/SNR, where SNR denotes the Signal-to-Noise Ratio of the measurements.
6
This model implies that the scattering geometry of the channel between the BS antenna array and the UE antenna remains constant over time intervals corresponding to the UE remaining in the same intersection of visibility regions. Since moving across the regions occurs at a time scale much larger than moving across one wavelength, it is safe to assume that the channel scattering geometry is locally stationary over intervals much longer than the time scale of the transmission of channel codewords. Such fixed geometry yields the so-called Wide Sense Stationary Uncorrelated Scattering (WSSUS) channel model, for which the channel vectors evolve in time according to a WSS processes. Also, we use the ubiquitous block-fading approximation, and assume that the channel random process can be approximated as locally piecewise constant over blocks of T time-frequency symbols, where T ≈ Wc Tc , Wc denoting
the channel coherence bandwidth and Tc denoting the channel coherence time [1]. B. Array and signaling model
We consider a BS equipped with a uniform linear array (ULA) with M 1 antennas and single-
antenna UEs. In an FDD system, communication takes place over two disjoint frequency bands. The UEs transmit to the BS over the frequency interval [ful −
Wul 2 , ful
+
Wul 2 ],
where ful is the UL carrier
frequency and Wul is the UL bandwidth. Likewise, the BS transmits to the UEs over the frequency band [fdl −
Wdl 2 , fdl
+
Wdl 2 ]
where fdl is the DL carrier frequency and Wdl is the DL bandwidth. The channel
bandwidth is always much less than the carrier frequency, i.e.
Wul ful
1,
Wdl fdl
1. Let α =
fdl ful
denote
the ratio between the DL and the UL carrier frequencies. Notice that in FDD systems in operation today we always have α > 1 (e.g., see [32]). A general form for the above mentioned WSSUS channel model in the time-frequency-antenna domain is given by Z h(t, f ) = Θ
ρ(t, dθ)a(θ, f ) ∈ CM ,
(1)
where Θ := [−θmax , θmax ) is the angular range scanned by the ULA, the vector a(θ, f ) ∈ CM is the array response at frequency f and angle θ, with m-th element with by j2π cf md sin θ
[a(θ, f )]m = e
0
,
(2)
where c0 denotes the speed of light and d the distance between two consecutive antennas, and ρ(t, dθ) is a random gain dependent on the time t and the angle range [θ, θ + dθ]. Assuming for simplicity no line-of-sight propagation, we model ρ(t, dθ) to be a zero-mean Gaussian stochastic process with independent increments respect to θ (uncorrelated scattering) and and WSS with respect to t. The angular
7
autocorrelation function is given by E ρ(t, dθ)ρ(t, dθ0 ) = γ(dθ)δ(θ − θ0 ),
(3)
where γ(dθ) is the channel ASF, modeling the power received from scatterers located at any angular R interval. It is convenient to assume that γ(dθ) is a normalized density function, such that Θ γ(dθ) = 1. Based on the narrowband assumption we consider the array response to be a constant function of frequency over each of the UL and DL bands separately and write aul (θ) := a(θ, ful ) and adl (θ) := λul a(θ, fdl ). We let d = κ 2 sin(θ , where λul = max )
ful c0
is the UL carrier wavelength and κ is the spatial
oversampling factor, usually (including here) set to κ = 1. With this definition we have that [aul (θ)]m = e
sin(θ) jmπ sin(θ
max )
and [adl (θ)]m = e
sin(θ) jmπα sin(θ
max )
. Notice that the exponents of the array response elements for
UL and DL differ by the factor α, which is typically slightly larger than 1 (e.g., for the LTE-IMT bands we have α =
2140 1950
≈ 1.1 [32]).
The channel vector covariance matrix is thereby given as follows i Z h γ(dθ)a(θ, f )a(θ, f )H , Ch (f ) = E h(t, f )h(t, f )H =
(4)
Θ
time-invariant due to stationarity. The dependence of the covariance matrix on frequency is due to the fact that, as discussed before, the array response vector is a function of frequency. The covariance matrix is Toeplitz positive semidefinite Hermitian symmetric6 and hence can be described by its first column R c(f ) as Ch (f ) = T (c(f )), where the first column is given by c(f ) = Θ γ(dθ)a(θ, f ). We denote UL
and DL covariance matrices by Cul := Ch (ful ) and Cdl := Ch (fdl ), respectively. III. DL C OVARIANCE E STIMATION FROM UL P ILOTS
The proposed DL covariance estimation method exploits the assumption that the channel ASF is the same for UL to DL [33–35]. In the UL, we are in the presence of a block-fading K -user multiple access channel where the BS receiver has M ≥ K antennas and the fading coherence block comprises T time-frequency
symbols. When the number of users K is less than T /2, a simple cut-set bound argument together with the high-SNR capacity result for the block-fading non-coherence MIMO channel in [39] (see also [3]) yields that the maximum UL multiplexing gain for isotropic channel vectors is given by K × (1 − K/T ).
This multiplexing gain is achievable by letting the K users send K mutually orthogonal pilot sequences 6
For x ∈ CM , we let T (x) denote the Toeplitz Hermitian matrix with first column x, i.e., with (i, j)-th element [T (x)]i,j =
xi−j for i ≥ j and [T (x)]i,j = x∗|i−j| for i < j. If x is a sampled autocorrelation function, then T (x) is positive semidefinite.
8
in the UL, which are used by the BS for channel estimation and multiuser MIMO detection. As a matter of fact, this is indeed the standard mode of operations of the massive MIMO UL (e.g., see [4, 6, 36]), and shall be assumed here as well. Since the UL pilots are orthogonal, the user channel vectors are mutually independent, and we assume Additive White Gaussian Noise (AWGN), the estimation of each UL user channel is decoupled. Hence, we can focus on the estimation of a generic user and neglect its index. The received UL pilot observation during the i-th UL slot, after projecting over the orthogonal pilot sequence of the given generic user, is given by [4] y[i] = hul [i] + n[i],
(5)
where hul [i] := h(iT, ful ) denotes the generic user channel vector during the i-th slot and where n ∼ CN (0, σ 2 IM ) is the measurement noise vector. Letting N0 denote the (complex circularly symmetric)
noise per-sample variance, Pul denotes the transmit UL power per symbol, and using the fact that the UL
pilot sequence includes K dimensions, the measurement noise variance is given by σ 2 = N0 /(KPul ). ˜ ul = Collecting a window of Nul UL measurements, we first calculate the sample covariance matrix C 1 PNul H i=1 y[i]y[i] . Based on the expected Toeplitz structure of Cul , we improve upon the sample Nul ˜ ul . The new estimator of the UL covariance covariance estimator by averaging over the diagonals of C ˆ ul = T (ˆ matrix is given by C cul ) where [ˆ cul ]` =
X 1 ˜ ul ]i,j , ` ∈ [M ]. [C M −`
(6)
i−j=`
This is a special case of the banding Toeplitz covariance estimator, which is extensively studied in the literature (see [40, 41]), and it is referred to here simply as “sample covariance estimator” since the following treatment does not depend critically on the specific covariance estimation employed, as long as it provides a good estimate. ˆul consists of two The proposed method to obtain an estimate of the DL covariance matrix from c
steps: first, an estimate of the ASF over a dense angular grid is obtained by solving a convex program. Then, we use the estimated ASF to compute an estimate of the DL channel covariance. A. Estimation of the channel ASF Define G as a uniform grid consisting of G M discrete angular points {θi }G i=1 , where each point 2(i−1) is given by θi = sin−1 (−1 + G ) sin(θmax ) ∈ Θ, and define G ∈ CM ×G to be a matrix whose ith column is given by
be written as γ(dθ) ≈
√1 aul (θi ), i ∈ [G]. M PG i=1 wi δ(θ − θi ) for
A discrete approximation of the ASF γ on the grid G can some vector w ∈ RG + . We find w by solving the following
9
non-negative least squares (NNLS) convex optimization program [37] ˆul k. w∗ = arg min kGw − c
(7)
w∈RG +
NNLS is known to be efficiently implementable via several convex optimization techniques [42]. After solving (7), the estimated discretized approximation of the ASF is simply given as γˆ (θi ) = [w∗ ]i . B. Covariance extrapolation via Fourier transform resampling Building on the theory developed in our companion paper [37], the problem of extrapolating the estimated UL covariance matrix to the DL frequency can be seen as the resampling of the Fourier transform of the channel ASF. To see this, notice that the m-th components of the first column cul of Cul are given by
Z [cul ]m =
sin θ
γ(dθ)ejmπ sin θmax =
Θ
Z
1
−1
γ(dξ)ejmπξ , m ∈ [M ],
(8)
θ where we introduce the change of variable ξ = sinsinθmax . Define the continuous Fourier transform of the R1 positive measure γ(dξ) as γˇ (x) = −1 γ(dξ)ejxπξ . Then it is clear from (8) that [cul ]m = γˇ (m), m ∈ [M ].
In words, the first column of the UL covariance matrix is simply a sampling of the Fourier transform of the positive measure γ(dξ) at points m = 0, . . . , M − 1. Taking similar steps, one can show that the R1 components of the first column of the DL covariance matrix are given by [cdl ]m = −1 γ(dξ)ejαmπξ , m ∈ [M ] and hence [cdl ]m = γˇ (αm), m ∈ [M ]. Estimating the DL covariance from the UL covariance is
equivalent to resampling γˇ (·) over a grid {0, α, 2α, . . . , (M − 1)α}, knowing its samples at the integer
grid {0, 1, 2, . . . , M −1}. Obviously, there is no hope to uniquely reconstruct the function at points beyond
the observation window [0, M − 1], unless additional regularity conditions are given [37]. Notice that this fundamental fact has been simply ignored in all the works that have proposed some ad-hoc method for
UL-DL covariance extrapolation (e.g., see [35]). On the other hand, the analysis in [37] shows that the resampling is accurate within the interval [0, M − 1]. This suggests that the DL covariance estimation
can be improved by truncating (i.e., setting to zero) the last few coefficients of the extrapolated DL autocorrelation function. Summarizing, the proposed DL covariance estimation method consists of the following steps: 1) Estimate a discrete approximation of the positive measure γ(θ) using the the UL sample covariance estimator and solving (7). The samples of the Fourier transform of this measure on the grid {0, . . . , M −1} asymptotically converge to those generated from the true angular scattering function
[43] for large sample size Nul .
10 mmax −1 2) Calculate the Fourier transform of the estimated measure on the grid {αm}m=0 where mmax ≤
M is a suitably tail truncation index, to obtain the estimated DL antenna autocorrelation function [ˆ cdl ]m =
G X i=1
sin θi
γˆ (θi )ejα(m−1)π sin θmax , m ∈ [mmax ],
(9)
while setting [ˆ cdl ]m = 0. The resulting DL covariance matrix is given by the Toeplitz completion ˆ dl = T (ˆ C cdl ).
As a final remark in this section, notice that the above DL covariance estimation method does not rely on particular features of the channel ASF. For example, it does not require that the ASF has a sparse or discrete support, as needed in other ad-hoc methods (e.g., see [35, 44, 45]). C. Circulant approximation of the DL covariance matrices The DL covariance estimation from UL pilot signals is performed for all the users k ∈ [K] at the BS.
These covariance matrices are Toeplitz by construction, due to the structure of the ULA as described before. In Section IV we will introduce the novel idea of active channel sparsification where, for a given a limited DL pilot dimension, the BS selects a set of angular directions to transmit data to the users, such that the system multiplexing gain is maximized. A necessary step before performing sparsification is that all of the estimated DL covariance matrices share a common set of eigenvectors, namely, the same virtual beam-space respresentation. In the massive MIMO regime where M 1, this is possible by considering
the circulant approximation of Toeplitz matrices that follows as an application of Szeg¨o Theorem (see details in [3] and references therein). Let Ck denote the estimated DL channel covariance of user k for k ∈ [K], where from now on we shall drop the subscript “dl” since it is clear from the context, as we ˚k = diag(FH Ck F) for consider only DL multiuser MIMO transmission. Define the diagonal matrices Λ
k ∈ [K]. There are several ways to define a circulant approximation [46], among which we choose the
following:
˚ k = FΛ ˚k FH . C
(10)
˚k converges to the diagonal eigenvalue matrix Λk of Ck , i.e. According to Szego’s theorem, for large M , Λ ˚k → Λk as M → ∞. This shows that, with a small error, we can find a set of common set of eigenvectors Λ
for all the users estimated DL covariance matrices. As a consequence, the DL channel covariance of user (k) ˚(k) ]m,m . In k is characterized simply via a vector of eigenvalues λk ∈ RM , with m-th element λm = [Λ
addition, the DFT matrix whose (m, n)-th entry is given by [F]m,n =
mn √1 e−j2π M , M
m, n ∈ [M ], forms
a unitary basis for (approximately) expressing any user channel vector via an (approximated) KarhunenLoeve expansion. In particular, let fn := [F]·,n denote the nth column of F. We can express the DL
11
channel vector of user k as h
(k)
≈
M −1 X
gn(k)
q
(k)
λn fn ,
(11)
n=0
(k)
where gn ∼ CN (0, 1) are i.i.d. random variables.
The columns of F are very similar to array response vectors and in fact, recalling equation (2), we n have that fn = √1M adl sin−1 ( λddl M ) . Hence, each column with index n ∈ [M ] of the DFT matrix can (k)
be seen as the array response to an angular direction and λn can be seen as the power of the channel vector associated with user k along that direction. Due to the limited number of local scatterers as seen at the BS and the large number of antennas of the array, only a few entries of λ(k) are significantly large, implying that the DL channel vector h(k) is sparse in the Fourier basis. This sparsity in the beam-space domain is precisely what has been exploited in the CS-based works discussed in Section I-A, in order to reduce the DL pilot dimension Tdl . It is also evident that this channel representation combined with the geometrically consistent model reviewed in Section II-A yields the common sparsity across users, as exploited by J-OMP in [17]. In the next section we propose the active channel sparsification method in order to maximize the system DL multiplexing gain for a fixed DL training dimension Tdl . IV. ACTIVE C HANNEL S PARSIFICATION AND DL C HANNEL P ROBING In this section we consider the estimation of the instantaneous realization of the DL user channel vectors. As in [3], we consider the concatenation of the physical channel with a fixed precoder, i.e., a linear transformation that may depends on the user channel statistics (notably, on their covariance matrices estimated as explained in Section III), but is independent of the instantaneous channel realizations, which in fact must be estimated via the closed-loop DL probing and channel state feedback mechanism as discussed in Section I. The BS transmits a training space-time matrix Ψ of dimension Tdl × M 0 , such that each row Ψi,. is
transmitted simultaneously from the M 0 ≤ M inputs of a precoding matrix B of dimension M 0 × M , and where M 0 is a suitable intermediate dimension that will be determined later. The precoded DL
training length (in time-frequency symbols) spans therefore Tdl dimensions, and the DL training phase is repeated at each DL slot of dimension T . Stacking the Tdl DL training symbols in a column vector, the corresponding observation at the UE k receiver is given by ˇ (k) + n(k) , y(k) = ΨBh(k) + n(k) = Ψh eff
(12)
12
ˇ (k) := Bh(k) as where B is the precoding matrix, h(k) is the channel vector of user k , and we define h eff
the effective channel vector, formed by the concatenation of the actual DL channel (antenna-to-antenna) with the precoder B. The measurement noise is AWGN with distribution n(k) ∼ CN (0, N0 ITdl ). The training matrix and precoding matrix are normalized such that
tr(ΨBBH ΨH ) = Tdl Pdl ,
(13)
where Pdl denotes the total BS transmit power and we define the DL Signal to Noise Ratio as SNR = Pdl /N0 .
Notice that most works on channel estimation focus on the estimation of the actual channels {h(k) }.
This is recovered in our setting by letting B = IM . However, our goal here is to design a “sparsifying” precoder B such that each user effective channel has low dimension (in the beam-space representation)
and yet the collection of effective channels for k ∈ [K] form a high-rank matrix. In this way, each user
channel can be estimated using a small pilot overhead Tdl , but the BS is still able to serve many users using spatial multiplexing in the DL (in fact, as many as the rank of the effective matrix). A. Necessity and implication of stable channel estimation For simplicity of exposition, in this section we assume that the channel representation (11) holds (k)
exactly and that the eigenvalues λ(k) are exactly sparse, with support Sk = {n : λn 6= 0} and sparsity level sk = |Sk |. We hasten to say that the above are convenient design assumptions, made in order to obtain a tractable problem, and that the precoder designed according to our simplifying assumption is applied to the actual physical channels. Under our assumptions, the channel vector h(k) can be stably estimated from the observation (12) without any artificial sparsification, i.e., letting B = IM , if and only if Tdl ≥ sk . This is mathematically
stated in the following lemma, proved in Appendix A.
Lemma 1: Consider the sparse Gaussian vector h(k) with support Sk : |Sk | = sk given by the RHS of
b (k) denote any estimator for h(k) based on the observation y(k) defined in (12) with B = IM , (11). Let h b (k) )(h(k) − h b (k) )H ] denote the corresponding estimation error covariance matrix. and let Re = E[(h(k) − h
h(k) can be stably recovered from y(k) if and only if Tdl ≥ sk in the following strong sense: for Tdl ≥ sk T there exist random ensembles of pilot matrices for which P {lim tr (R ) = 0} = 1, e N ↓0 0 Sk :|Sk |=sk
while for Tdl < sk , any ensemble of random pilot matrices yields P (limN0 ↓0 tr(Re ) = 0) = 0 for all support sets Sk of cardinality sk .
Notice that Lemma 1 asserts the existence of training matrices for which the channel estimation MSE vanishes w.h.p. as N0 → 0 irrespectively of the support set of size sk (one matrix is good for all support
13
sets), provided that Tdl ≥ sk . Also, the converse statement (infeasibility of stable estimation for Tdl < sk ) applies also fixed deterministic pilot matrices, as a special case of random ensembles putting probability mass 1 on a single given matrix. It is important to note that the requirement of estimation stability is essential in order to preserve a non-trivial multiplexing gain of the multiuser MIMO DL, irrespectively of the DL precoding scheme. In fact, it is well-known that if the estimation MSE of the user channels does not vanish as N0 ↓ 0, then the
degrees of freedom of the underlying multiuser MIMO DL channel collapse to 1, i.e., for sufficiently high SNR the best strategy transmits to a single user, since any form of multiuser precoding would inevitably
lead to an interference limited regime, where the sum rate remains bounded while SNR → ∞ [47]. In
contrast, it is also well-known that when the channel estimation error vanishes as O(N0 ) for N0 ↓ 0,
the ideal degrees of freedom as if the channel was perfectly known are achievable by very simple linear precoding [10]. A possible solution to this problem consists of serving only the users whose channel support sk is not larger than Tdl . This is assumed implicitly in all CS-based schemes (see Section I-A), and represents a major intrinsic limitation of the CS-based approaches. B. Sparsifying precoder optimization ˇ (k) = Bh(k) Here, we wish to design the precoder B such that the support of the effective channels h eff
is not larger than Tdl , such that all users have a chance of being served. Let H = L G ∈ CM ×K denote the matrix of DL channel coefficients expressed in the DFT basis (11), in which each columnqof H (k) represents the coefficients vector of a user, where L is a M × K matrix with elements [L]m,k = λm , (k)
where G ∈ CM ×K has i.i.d. elements [G]m,k = gm ∼ CN (0, 1), and where denotes the Hadamard
(elementwise) product. Let A = [L] denote a one-bit thresholded version of L, such that [A]m,k = 1 if (k)
λm > , where > 0 is a suitable small threshold, used to identify the components that are significantly
larger than 0 from the “almost zero” ones, and consider the M × K bipartite graph L = (A, K, E) with (k)
adjacency matrix A and weights wm,k = λm on the edges (m, k) ∈ E .
Given a pilot dimension Tdl , our goal consists in selecting a subgraph L0 = (A0 , K0 , E 0 ) of L in which
each node on either side of the graph has a degree at least 1 and such that
1) For all k ∈ K0 we have degL0 (k) ≤ Tdl , where degL0 denotes the degree of a node in the selected subgraph.
2) The sum of weights of the edges adjacent to any node k ∈ K0 in the subgraph L0 is greater than a P threshold, i.e. m∈NL0 (k) wm,k ≥ P0 , ∀k ∈ K0 , where NL0 (k) denotes the set of neighbors in L0
14
of node k . 3) The channel matrix HA0 ,K0 obtained from H by selecting a ∈ A0 (referred to as “selected beam directions”) and k ∈ K0 (referred to as “selected users”) has large rank.
The first criterion enables the stable estimation of the effective channel of any selected user with only Tdl common pilot dimensions and Tdl complex symbols of feedback per selected user. The second
criterion makes sure that the effective channel strength of any selected user is greater than a certain desired threshold. The third criterion is motivated by the fact that the DL multiplexing gain is given by rank(HA0 ,K0 ) × max{0, 1 − Tdl /T }, and it is obtained by serving a number of users equal to the rank
of the effective channel matrix.
The following lemmas relate the rank of the effective channel matrix to a graph-theoretic parameter, namely, the size of the maximal matching.7 Lemma 2: [Skeleton or “CUR” decomposition [48]] Consider H ∈ CM ×K , of rank r. Let W be an
r × r non-singular intersection submatrix obtained by selecting r rows and r columns of H. Then, we
have
H = CUR
(14)
where C ∈ CM ×r and R ∈ Cr×K are the matrices of the selected columns and rows forming the
intersection W and U = W−1 .
Lemma 3: [Rank and perfect matchings] Let W denote an r×r matrix with some elements identically zero, and the non-identically zero elements independently drawn from a continuous distribution. Consider the associated bipartite graph with adjacency matrix A such that Ai,j = 1 if Wi,j is not identically zero, and Ai,j = 0 otherwise. Then, W has rank r with probability 1 if and only if the associated bipartite graph contains a perfect matching.
A similar theorem can be found in [49], but we provide a direct proof in Appendix B for the sake of completeness. Lemmas 2 and 3 result in the following corollary, which is an original albeit simple contribution of this work: Corollary 1: The rank r of a random matrix H ∈ CM ×K with either identically zero elements or
elements independently drawn from a continuous distribution is given, with probability 1, by the size of the largest intersection submatrix whose associated bipartite graph (defined as in Lemma 3) contains a perfect matching. 7
A matching is a set of edges of a graph without common vertices.
15
u1 u2 u3 u4 u5
Coupled Angular Directions (A) a1
a2
a3
a4
a5
w2,3 w3,1 w5,2
a1 a2 a3 a4 a5
a8
w7,5
w6,2
w3,3
w2,1
a7
w5,4
w4,4 w1,1
a6
w6,5
w7,2
a6 a7
w8,5
w6,4
a8 u1
u2
u3
u4
u5
Adjacency Matrix W
Users (K)
(a)
(b)
Fig. 2: (a) An example of a bipartite graph L. (b) The corresponding weighted adjacency matrix W. Obviously this corollary holds in our case where the non-zero elements of H are drawn from the complex Gaussian distribution. Using Corollary 1 this problem can be formulated as: Problem 1: Let Tdl denote the available DL pilot dimension and let M(A0 , K0 ) denote a matching
of the subgraph L0 (A0 , K0 , E 0 ) of the bipartite graph L(A, K, E). Find the solution of the following
optimization problem:
maximize 0 0
M A0 , K0
(15a)
subject to
degL0 (k) ≤ Tdl ∀k ∈ K0 , X wa,k ≥ P0 , ∀k ∈ K0 .
(15b)
A ⊆A,K ⊆K
(15c)
a∈NL0 (k)
♦ Next we shall express this problem in an equivalent tractable form. In the following, we transform this problem into an equivalent tractable form, namely, into a mixed integer linear program (MILP), which can be efficiently solved by standard optimization tools. First, without loss of generality let’s assume that L contains no isolated nodes (since these would be discarded anyway). We also introduce the |A| × |K|
weighted adjacency matrix W where [W]m,k = wm,k . An example of the bipartite graph L and its
corresponding weighted adjacency matrix W is illustrated in Figs. 2a and 2b.
First, given the bipartite graph L(A, K, E), we select the subgraph L0 (A0 , K0 , E 0 ), so that the constraint
(15b) is satisfied. We introduce the binary variables {xm , m ∈ A} and {yk , k ∈ K} to indicate if beam m
and user k are selected, respectively. As such, the constraint (15b) is equivalent to the set of constraints:
16
xm ≤ yk ≤
X m∈A
X k∈K
[A]m,k yk ∀m ∈ A
X
[A]m,k xm ∀k ∈ K
(16a) (16b)
m∈A
[A]m,k xm ≤ Tdl yk + M (1 − yk ) ∀k ∈ K
(16c)
In particular, (16a) ensures that if the beam m is selected (i.e., xm = 1), there must be some k ∈ K
such that (m, k) ∈ E is selected as well, whereas if the beam m is not selected, then this constraint
is redundant. Similarly, in (16b) if the user k is selected (i.e., yk = 1), there must be some m ∈ A such that (m, k) ∈ E is selected as well. Furthermore, (16c) guarantees that if the user k is chosen (i.e.,
yk = 1), the number of chosen beams with xm = 1 is no more than Tdl , and otherwise this constraint is
redundant. Meanwhile, the constraint (15c) is written as: P 0 yk ≤
X m∈A
[W]m,k xm ∀k ∈ K
(17)
which ensures that if the user k is chosen (i.e., yk = 1) then the sum weights of the selected beams (i.e., m ∈ NL0 (k) if xm = 1) is no less than P0 , while if the user k is not chosen (i.e., yk = 0) then this
constraint is not required and redundant. A closer look reveals that the constraint (17) renders the one (16b) redundant, because when yk = 1 in (17) there must exist at least one m ∈ A with xm = 1.
Second, given the selected subgraph L0 (A0 , K0 , E 0 ), we find a matching M(A0 , K0 ) with maximum
cardinality. To this end, we introduce another set of binary variables {zmk , m ∈ A, k ∈ K} to indicate if
an edge (a, k) ∈ E is chosen to form the maximum matching in L0 (A0 , K0 , E 0 ). Following the canonical
linear program formulation of the maximum cardinality matching for bipartite graphs, we translate the objective function in (15) into the following optimization problem maximize zm,k
subject to
X X 0
m∈A k∈K
X k∈K0
[A]m,k zm,k
(18a)
0
[A]m,k zm,k ≤ 1 ∀m ∈ A0 ,
(18b)
[A]m,k zm,k ≤ 1 ∀k ∈ K0 ,
(18c)
X m∈A
0
zm,k ∈ {0, 1} ∀m ∈ A0 , k ∈ K0 .
(18d)
Now, to transport the optimization problem on L0 to the original setting on L, we need to guarantee that
M(A0 , K0 ) ⊆ E 0 , i.e., zmk = 1 only if m ∈ A0 , i.e., xm = 1, and k ∈ K0 , i.e., yk = 1. This is obtained
17
for a given configuration of the variables {xm } and {yk } which define L0 , by adding constraints to (18) and yields
maximize zm,k
subject to
XX
[A]m,k zm,k
(19a)
m∈A k∈K
X k∈K
[A]m,k zm,k ≤ 1 ∀m ∈ A,
X
(19b)
[A]m,k zm,k ≤ 1 ∀k ∈ K,
(19c)
[A]m,k zm,k ≤ xm ∀k ∈ K, m ∈ A,
(19d)
[A]m,k zm,k ≤ yk ∀k ∈ K, m ∈ A,
(19e)
zm,k ∈ {0, 1} ∀m ∈ A, k ∈ K,
(19f)
m∈A
where (19d)-(19e) impose that the edge set {(m, k) : zm,k = 1} should be a subset of E 0 .
A further inspection on these constraints yields the following equivalent simplified form: XX maximize zm,k zm,k
subject to
(20a)
m∈A k∈K
zm,k ≤ [A]m,k , ∀m ∈ A, k ∈ K, X zm,k ≤ xm , ∀m ∈ A,
(20b) (20c)
k∈K
X
zm,k ≤ yk , ∀k ∈ K,
(20d)
zm,k ∈ {0, 1} ∀m ∈ A, k ∈ K,
(20e)
m∈A
where the additional constraint (20b) turns all the term of the type [A]m,k zm,k in (19) into to zm,k in (20), the constraint (20c) results from the combination of the constraints (19b) and (19d), and (20d) results from the combination of (19c) with (19e). As a matter of fact, the formulation in (20) can be seen as a modified maximum cardinality bipartite matching with selective vertices, in which the vertices with xm = 1 and yk = 1 are selected to participate in the maximum cardinality matching. From the constraints (20c) and (20d), we have zm,k = 0 when xm = 0 or yk = 0. On the other hand, when xm = yk = 1, the optimization problem (20) is exactly a canonical formulation of maximum cardinality bipartite matching. Hence, irrespectively of the value of xm ∈ {0, 1} and yk ∈ {0, 1}, the integer constraints zm,k ∈ {0, 1} can be relaxed to the linear constraints zm,k ∈ [0, 1] and the solution of
the resulting MILP is guaranteed to be integral, and therefore coincide with the optimal solution of the original integer program (see Theorem 1 below).
18
Finally, we add a regularizer
P
m∈A xm
to the objective function in order to favor solutions using as
many virtual beam directions as possible. For what said above, the Problem 1 with regularized objective function is equivalent to the following MILP: PMILP : maximize xm ,yk ,zm,k
subject to
XX
zm,k +
m∈A k∈K
X
xm
(21a)
m∈A
zm,k ≤ [A]m,k ∀m ∈ A, k ∈ K, X zm,k ≤ xm ∀m ∈ A,
(21b) (21c)
k∈K
X
zm,k ≤ yk ∀k ∈ K,
(21d)
[A]m,k xm ≤ Tdl yk + M (1 − yk ) ∀k ∈ K,
(21e)
m∈A
X
m∈A
P 0 yk ≤ xm ≤
X m∈A
X k∈K
[W]m,k xm ∀k ∈ K,
[A]m,k yk ∀m ∈ A,
(21f) (21g)
xm , yk ∈ {0, 1} ∀a ∈ A, k ∈ K,
(21h)
zm,k ∈ [0, 1] ∀m ∈ A, k ∈ K,
(21i)
The3 following result is proved in Appendix C: Theorem 1: The problem PMILP stated in (21) has always binary-valued variables {zm,k , m ∈ A, k ∈
K}.
C. Channel estimation and multiuser precoding ∗ K For a given set of user DL covariance matrices, let {x∗m }M m=1 and {yk }k=1 denote the MILP solution and
denote by B = {m : x∗m = 1} = {m1 , m2 , . . . , mM 0 } the set of selected beams directions of cardinality
|B| = M 0 and by K = {k : yk∗ = 1} the set of selected users of cardinality |K| = K 0 . The resulting
sparsifying precoding matrix B in (12) is simply obtained as B = FH B , where FB = [fm1 , . . . , fmM 0 ] and fm denotes the m-th column of the M × M unitary DFT matrix F. Given a DFT column fm , we have 0 if m ∈ /B Bfm = u if m = m ∈ B i
i
19
where ui denotes a M 0 × 1 vector with all zero components but a single “1” in the i-th position. Using the above property and (11), the effective DL channel vectors take on the form q X X q (k) (k) (k) (k) (k) ˇ heff = B gm λ m f m = λmi gm ui . i m∈Sk
(22)
i:mi ∈B∩Sk
In words, the effective channel of user k is a vector with non-identically zero elements only at the positions corresponding to the intersection of the beam directions in Sk , along which the physical channel of user k
carries positive energy, and in B , selected by the sparsifying precoder. The non-identically zero elements (k)
are independent Gaussian coefficients ∼ CN (0, λmi ). Notice also that, by construction, the number of
non-identically zero coefficients are |B ∩Sk | ≤ Tdl and their positions (encoded in the vectors ui in (22)),
are known to the BS. Hence, the effective channel vectors can be estimated from the Tdl -dimensional DL pilot observation (12) with an estimation MSE that vanishes as 1/SNR. The pilot observation in the form (12) is obtained at the user k receiver. In this work, we assume
that each user sends its pilot observation using Tdl channel uses in the UL, using analog unquantized feedback, as analyzed for example in [10, 11]. At the BS receiver, after estimating the UL channel from the UL pilots, the BS can apply linear MMSE estimation and recovers the channel state feedback which takes on the same form of (12) with some additional noise due to the noisy UL transmission.8 With the above precoding, we have BBH = IM 0 . Furthermore, we can choose the DL pilot matrix Ψ to be proportional to a random unitary matrix of dimension Tdl × M 0 , such that ΨΨH = Pdl ITdl . In this
ˇ (k) from way, the DL pilot phase power constraint (13) is automatically satisfied. The estimation of h eff
the DL pilot observation (12) (with suitably increased AWGN variance due to the noisy UL feedback) is completely straightforward and shall not be treated here in details. For the sake of completeness, we conclude this section with the DL precoded data phase and the corresponding sum rate performance metric that we shall use in Section V for numerical analysis and 0
b (1) , . . . , h b (K ) ] be the matrix of the estimated effective b eff = [h comparison with other schemes. Let H eff eff DL channels for the selected users. We consider the ZF beamforming matrix V given by the columnnormalized version of the Moore-Penrose pseudoinverse of the estimated channel matrix, i.e., V = −1 † † b eff G1/2 , where H b eff = H b eff H bH H b eff and G is a diagonal matrix that makes the columns H eff of V to have unit norm. A channel use of the DL precoded data transmission phase at the k -th user 8
As an alternative, one can consider quantized feedback using Tdl channel uses in the UL (see [10, 11] and references therein).
Digital quantized feedback yields generally a better end-to-end estimation MSE in the absence of feedback errors. However, the effect of decoding errors on the channel state feedback is difficult to characterize in a simple manner since it depends on the specific joint source-channel coding scheme employed. Hence, in this work we restrict to the simple analog feedback.
20
receiver takes on the form H y (k) = h(k) BH VP1/2 d + n(k) ,
(23)
0
where d ∈ CK ×1 is a vector of unit-energy user data symbols and P is a diagonal matrix defining the power allocation to the DL data streams. The transmit power constraint is given by tr(BH VPVH B) = tr(VH VP) = tr(P) = Pdl , where we used BBH = IM 0 and the fact that VH V has unit diagonal elements by construction. In particular, in the results of Section V we use the simple uniform power allocation Pk = Pdl /K 0 to each k -th user data stream.
b eff = Heff , we have that (23) reduces to In the case of perfect ZF beamforming, i.e., for H y (k) =
p
Gk Pk dk + n(k)
where Gk is the k -th diagonal element of the norm normalizing matrix G, Pk is the k -th diagonal element b eff 6= Heff , due of the power allocation matrix P, and dk is the k -th user data symbol. Since in general H
to a non-zero estimation error, the received symbol at user k receiver is given by y (k) = bk,k dk +
X
bk,k0 dk0 + n(k) ,
k0 6=k
H where the coefficients (bk,1 , . . . , bk,K 0 ) are given by the elements of the 1×K 0 row vector h(k) BH VP1/2 √ in (23). Of course, in the presence of an accurate channel estimation we expect that bk,k ≈ Gk Pk and
bk,k0 ≈ 0 for k 0 6= k .
For simplicity, in this paper we compare the performance of the proposed scheme with that of
the state-of-the-art CS-based schemes in terms of the ergodic sum rate, assuming that all coefficients (bk,1 , . . . , bk,K 0 ) are known to the corresponding receiver k . Including the DL training overhead, this
yields the rate expression (see also [50]) " !# |bk,k |2 Tdl X E log 1 + . Rsum = 1 − P T 1 + k0 6=k |gk,k0 |2 k∈K
(24)
V. S IMULATION R ESULTS In this section we compare the performance of our proposed method to the CS-based method proposed in [17] in terms of sum-rate. The method in [17] is based on common probing of the DL channel with random Gaussian pilots. The noisy vector of probed values y(k) of all the users k = 1, . . . , K are collected
21
at the BS via analog feedback and the sparse channel coefficients are recovered using a joint orthogonal matching pursuit (J-OMP) technique. For brevity, in the following we refer to this method as the J-OMP method. We consider M = 256 antennas at the BS, K = 26 users, and resource blocks of size T = 168 symbols. Two propagation geometries are considered: a geometry based on the GSCM channel model where scatterers are in the form of MPC clusters with angular support given by intervals in the angle domain, and a geometry with spiky (discrete) scatterers. In both scenarios, DL training and data transmission is the same and is summarized as follows: in our proposed method, the BS estimates the users’ UL covariance matrices after a period of UL pilot transmission via the sample covariance estimator (6). Then, it uses this information to estimate the DL covariance matrices via the method developed in Section III. Given the obtained DL channel covariance matrix estimates, we first perform the circulant approximation and extract the vector of approximate eigenvalues as in (10). Then, we compute the sparsifying precoder B via the MILP solution as given in Section IV-B. In the results presented here, we set the parameter P0 in the MILP to a small value in order to favor the multiplexing gain (high rank of the resulting effective channel matrix).9 After probing the effective channel of the selected users along these active beam directions via a random unitary pilot matrix Ψ , we recover them via a simple Least-Square (aka, Maximum-Likelihood) estimation. Eventually, for both J-OMP and for our method, we compute the ZF beamforming matrix based on the obtained channel estimates. In addition, instead of considering all selected users, in both cases we apply the Greedy ZF user selection approach of [51], that yields a significant benefit when the number of users is close to the rank of the effective channel matrix. The signal to noise ratio is defined as SNR = Pdl /N0 and during the simulations we consider ideal noiseless feedback for simplicity, i.e., we assume that the BS receives the measurements in (12) without extra noise to the system.10 It is worthwhile to mention that during the channel estimation stage, unlike the J-OMP method, we do not assume that all the users share a particular MPC. The sparsity order of each channel vector is given as an input to the J-OMP method, but not to our proposed method. This represents a little genie-aided advantage of J-OMP, that we introduce here for simplicity. 9
This approach is appropriate in the medium to high-SNR regime. For low SNR, it is often convenient to increase P0 in order
to serve less users with a larger beamforming energy transfer per user. 10
Notice that by introducing noisy feedback the relative gain w.r.t. J-OMP is even larger, since CS schemes are known to be
more noise-sensitive than plain Least-Squares estimation with known sparsity pattern.
22 80 Our Alg., Cdl Known Our Alg., Cdl Estimated J-OMP
70 bits Sum-Rate [ s·Hz ]
γ(dξ)
1
60 50 40 30 20 10 0
−1
ξi
(a)
1
20
35
50
65
80
95
110
125
140
155
170
DL Pilot Dimension (Tdl )
(b)
Fig. 3: (a) ASF consisting of two MPC clusters. These clusters are chosen at random out of the total three clusters. The dotted cluster is the one left out. (b) Achievable sum-rate as a function of DL pilot dimension with SNR = 20 dB. Here the BS is equipped with M = 256 antennas and serves K = 26 users through a propagation geometry of block-type scatterers.
A. Diffuse Scattering In this geometry we consider three MPC clusters with random locations within the angular range (parametrized by ξ rather than θ) [−1, 1), each spanning an interval of length |ξi | = 0.4, i = 1, 2, 3.
The ASF for each user is obtained by selecting at random two out of three such clusters, such that the overlap of the angular components among users is very significant. The ASF is equal to 1 over the angular intervals corresponding to the chosen MPCs and 0 elsewhere (see Fig. 3a). From the ASF, we can calculate the UL and DL covariance matrices using (4). The UL covariance matrix is used to generate random snapshots of the channel for the UL sample covariance estimation. The DL covariance matrix is used to generate random snapshots of the DL channels, which are then multiplied by the sparsifying precoder, and estimated according to the DL training and feedback scheme said before. The described arrangement results in each generated channel vector being roughly sk = 0.4 × M ≈ 102-sparse.
Fig. 3b illustrates the achievable sum-rate for the case where we use our method with known DL
covariance matrix, the case where we use our method with estimated DL covariance, and the J-OMP method as a function of DL pilot dimension with SNR = 20 dB. We notice that the DL covariance estimation step imposes a relatively small degradation to the final system performance. This figure also shows that there is an optimal DL pilot dimension that maximizes the sum-rate. This optimal value is Tdl ≈ 35 for our proposed method and Tdl ≈ 100 for the J-OMP method. This emphasizes the necessity
of active channel sparsification: the inherent dimension of each channel vector is about 102, while the
23 100 γ(dξ)
Our Alg., Cdl Known Our Alg., Cdl Estimated J-OMP
bits Sum-Rate [ s·Hz ]
80
1
60
40
20
0 −1
0
1
(a)
20
35
50
65
80
95
110
125
140
155
170
DL Pilot Dimension (Tdl )
(b)
Fig. 4: (a) ASF consisting of randomly located spikes. (b) Achievable sum-rate as a function of DL pilot dimension with SNR = 20 dB. Here the BS is equipped with M = 256 antennas and serves K = 26 users through a propagation geometry of spiky scatterers.
optimal common training pilot dimension is about 35. On the other hand, the CS-based J-OMP method fails to recover the DL channel appropriately, leading to a poor sum-rate, since this method has to spend a large pilot dimension for proper DL channel estimation (even exploiting sparsity and joint sparsity across the users). B. Spiky Scattering In this geometry, we consider for each user a scattering function consisting si = 20 randomly located P delta functions in the ξ -parametrized angular range [−1, 1), i.e. γ(dξ) = 20 i=1 δ(ξ − ξi ), ξi ∈ [−1, 1).
The channel generation, training, and data transmission procedure is the same as previously explained. The achievable sum-rate vs DL pilot dimension with SNR = 20 dB is presented in Fig. 4b. As before, the proposed method achieves a better sum rate than J-OMP. Since the channels in this case are significantly more sparse than in the previous example, the optimal achievable sum-rate is higher since methods need far less DL pilot dimension than before. In addition, also the relative gap between our method and J-OPM is reduced, since these highly sparse channels are the ideal setting for CS-based schemes. VI. C ONCLUSION We presented a novel approach for FDD massive MIMO systems. Our approach exploits the reciprocity of the angular scattering function to estimate the covariance matrix of the users’ DL channels from the sample covariance matrix calculated from the UL pilots sent by the users to the BS. The estimated DL covariance
24
matrices of all the users can be approximately expressed in terms of a common system of covariance eigenvectors (beam-space representation). For the ULA setting considered here, such eigenvectors are the columns of a DFT matrix, and this representation incurs a vanishing error for large number of BS antennas M . This beam-space information allows the BS to smartly select a set of beams and users such that communication over the resulting effective channels is efficient even with a limited DL pilot budget. This beam-user selection procedure is referred to here as active channel sparsification and it is achieved via a newly formulated mixed integer linear program (MILP). Our simulation results show that the proposed method performs well even in cases where the available DL pilot dimension is far less than the inherent dimension of the channel vectors. This represents a fundamental improvement with respect to the state-of-the-art compressed sensing method, for which the DL pilot dimension (number of measurements) should always be larger than the channel sparsity order in the angle domain. We conclude by mentioning that in this paper we focused on purpose on a simple single-cell scenario. When multiple cells are considered, inter-cell interference should be taken into account. However, unlike TDD systems where UL and DL across different cells are synchronous, and the limited pilot dimension yields pilot contamination (see [4–6]), in FDD systems there is no need for tight inter-cell synchronization. Hence, the inter-cell incoherent interference reflects into a higher level of the background noise, but it is taken into account in a completely straightforward manner (as always traditionally done in the analysis of cellular systems) since no pilot contamination appears in FDD systems. A PPENDIX A P ROOF OF L EMMA 1 The proof follows by using the sparse representation
h(k)
=
(k) m∈Sk gm
P
q
(k)
λm fm (see (11)), which
holds exactly by assumption. Estimating h(k) is equivalent to estimating the vector of KL Gaussian i.i.d. (k)
coefficients g(k) = (gm : m ∈ Sk ) ∈ Csk ×1 . Define the M × sk DFT submatrix FSk = (fm : m ∈ Sk ), (k)
and the corresponding diagonal sk × sk matrix of the non-zero eigenvalues ΛSk . After some simple
standard algebra, the MMSE estimation error covariance of g(k) from y(k) in (12) with B = IM can be written in the form 1/2 −1 (k) H (k) 1/2 H H H e e = Isk − Λ(k) R F Ψ ΨF Λ F Ψ + N I ΨF Λ . 0 S T S k dl k S S Sk Sk Sk k k
(25)
(k) e e ), we have that e e (Λ(k) )1/2 FH , such that tr(Re ) = tr(ΛSk R Using the fact that Re = FSk (ΛSk )1/2 R Sk Sk
e e ) have the same vanishing order with respect to N0 . In particular, it is sufficient to tr(Re ) and tr(R
25
e e ) as a function of N0 . Now, using the Sherman-Morrison-Woodbury matrix consider the behavior of tr(R inversion lemma [52], after some algebra omitted for the sake of brevity we arrive at e e ) = sk − tr(R
sk X i=1
µi , N0 + µi
(26)
(k)
(k)
H 1/2 . where µi is the i-th eigenvalue of the sk × sk matrix A = (ΛSk )1/2 FH Sk Ψ ΨFSk (ΛSk )
For Tdl ≥ sk , there exist ensembles of random pilot matrices Ψ such that A has rank sk with
probability 1, for all support set Sk . For example, it is sufficient to choose Ψ to be i.i.d. Gaussian with components ∼ CN (0, 1). In this case, µi > 0 for all i ∈ [sk ] and (26) vanishes as O(N0 ) as N0 ↓ 0. In contrast, if Tdl < sk the rank of A is at most Tdl with probability 1 for any support set. In this case, Pk µi limN0 ↓0 sk − si=1 N0 +µi ≥ sk − Tdl > 0. A PPENDIX B P ROOF OF L EMMA (3) The determinant of W is given by the expansion det(W) =
P
ι∈πr
sgn(ι)
Q
i [W]i,ι(i) ,
where ι is a
permutation of the set {1, 2, . . . , r}, where πr is the set of all such permutations and where sgn(ι) is Q either 1 or -1. The product i [W]i,ι(i) is non-zero only for the perfect matchings in the bipartite graph. Hence, if the bipartite graph contains a perfect matching, then det(W) 6= 0 with probability 1 (and
rank(W) = r), since the non-identically zero entries of W are drawn from a continuous distribution. If it does not contain a perfect matching, then det(W ) = 0 and therefore rank(W) < r. A PPENDIX C P ROOF OF T HEOREM 1
It suffices to show that zm,k are binary, given that xm and yk are binary. First, if either xm , m ∈ A or
yk , k ∈ K are 0, then za,k = 0. So, we only need to focus on the case where xm = yk = 1, m ∈ A, k ∈ K.
In that case, the constraints of PMILP with respect to zm,k , m ∈ A, k ∈ K form a convex polytope. This
polytope is called the bipartite matching polytope, which is integral, i.e. all of its extreme points have integer (and in this case binary) values (see [53, Corollary 18.1b. and Theorem 18.2.]). Therefore, given xm , yk ∈ {0, 1}, ∀m ∈ A, k ∈ K, PMILP reduces to a linear program with respect to the variables zm,k
and the optimal solutions are the integral extreme points of the corresponding polyhedra and the proof is complete.
26
R EFERENCES [1] D. Tse and P. Viswanath, Fundamentals of wireless communication.
Cambridge university press, 2005.
[2] T. L. Marzetta, “How much training is required for multiuser MIMO?” in Fortieth Asilomar Conference on Signals, Systems and Computers, 2006. ACSSC’06.
IEEE, 2006, pp. 359–363.
[3] A. Adhikary, J. Nam, J.-Y. Ahn, and G. Caire, “Joint spatial division and multiplexing: the large-scale array regime,” IEEE Trans. on Inform. Theory, vol. 59, no. 10, pp. 6441–6463, 2013. [4] T. L. Marzetta, “Noncooperative cellular wireless with unlimited numbers of base station antennas,” IEEE Trans. on Wireless Commun., vol. 9, no. 11, pp. 3590–3600, Nov. 2010. [5] E. G. Larsson, O. Edfors, F. Tufvesson, and T. L. Marzetta, “Massive MIMO for next generation wireless systems,” IEEE Communications Magazine, vol. 52, no. 2, pp. 186–195, 2014. [6] T. L. Marzetta, E. G. Larsson, H. Yang, and H. Q. Ngo, Fundamentals of Massive MIMO.
Cambridge
University Press, 2016. [7] F. Boccardi, R. W. Heath, A. Lozano, T. L. Marzetta, and P. Popovski, “Five disruptive technology directions for 5G,” IEEE Communications Magazine, vol. 52, no. 2, pp. 74–80, 2014. [8] S. Sesia, M. Baker, and I. Toufik, LTE-the UMTS long term evolution: from theory to practice. John Wiley & Sons, 2011. ¨ [9] S. Malkowsky, J. Vieira, L. Liu, P. Harris, K. Nieman, N. Kundargi, I. C. Wong, F. Tufvesson, V. Owall, and O. Edfors, “The World’s First Real-Time Testbed for Massive MIMO: Design, Implementation, and Validation,” IEEE Access, vol. 5, pp. 9073–9088, 2017. [10] G. Caire, N. Jindal, M. Kobayashi, and N. Ravindran, “Multiuser MIMO achievable rates with downlink training and channel state feedback,” IEEE Transactions on Information Theory, vol. 56, no. 6, pp. 2845– 2866, 2010. [11] M. Kobayashi, N. Jindal, and G. Caire, “Training and feedback optimization for multiuser MIMO downlink,” IEEE Transactions on Communications, vol. 59, no. 8, pp. 2228–2240, 2011. [12] H. Yin, D. Gesbert, M. Filippou, and Y. Liu, “A coordinated approach to channel estimation in large-scale multiple-antenna systems,” IEEE Journal on Selected Areas in Communications, vol. 31, no. 2, pp. 264–273, 2013. [13] D. J. Love, R. W. Heath, and T. Strohmer, “Grassmannian beamforming for multiple-input multiple-output wireless systems,” IEEE transactions on information theory, vol. 49, no. 10, pp. 2735–2747, 2003. [14] N. Jindal, “MIMO broadcast channels with finite-rate feedback,” IEEE Transactions on information theory, vol. 52, no. 11, pp. 5045–5060, 2006. [15] Z. Jiang, A. F. Molisch, G. Caire, and Z. Niu, “Achievable rates of FDD massive MIMO systems with spatial channel correlation,” IEEE Transactions on Wireless Communications, vol. 14, no. 5, pp. 2868–2882, 2015. [16] P. W. Chan, E. S. Lo, R. R. Wang, E. K. Au, V. K. Lau, R. S. Cheng, W. H. Mow, R. D. Murch, and
27
K. B. Letaief, “The evolution path of 4G networks: FDD or TDD?” IEEE Communications Magazine, vol. 44, no. 12, pp. 42–50, 2006. [17] X. Rao and V. K. Lau, “Distributed compressive CSIT estimation and feedback for FDD multi-user massive MIMO systems,” IEEE Transactions on Signal Processing, vol. 62, no. 12, pp. 3261–3271, 2014. [18] A. M. Sayeed, “Deconstructing multiantenna fading channels,” IEEE Transactions on Signal Processing, vol. 50, no. 10, pp. 2563–2579, 2002. [19] W. U. Bajwa, J. Haupt, A. M. Sayeed, and R. Nowak, “Compressed channel sensing: A new approach to estimating sparse multipath channels,” Proceedings of the IEEE, vol. 98, no. 6, pp. 1058–1076, 2010. [20] P.-H. Kuo, H. Kung, and P.-A. Ting, “Compressive sensing based channel feedback protocols for spatiallycorrelated massive antenna arrays,” in Wireless Communications and Networking Conference (WCNC), 2012 IEEE.
IEEE, 2012, pp. 492–497.
[21] M. S. Sim, J. Park, C.-B. Chae, and R. W. Heath, “Compressed channel feedback for correlated massive MIMO systems,” Journal of Communications and Networks, vol. 18, no. 1, pp. 95–104, 2016. [22] Z. Gao, L. Dai, Z. Wang, and S. Chen, “Spatially common sparsity based adaptive channel estimation and feedback for FDD massive MIMO,” IEEE Transactions on Signal Processing, vol. 63, no. 23, pp. 6169–6183, 2015. [23] Y. Ding and B. D. Rao, “Dictionary learning based sparse channel representation and estimation for FDD massive MIMO systems,” arXiv preprint arXiv:1612.06553, 2016. [24] D. L. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289– 1306, 2006. [25] E. J. Cand`es and M. B. Wakin, “An introduction to compressive sampling,” IEEE signal processing magazine, vol. 25, no. 2, pp. 21–30, 2008. [26] J. Chen and X. Huo, “Theoretical results on sparse representations of multiple-measurement vectors,” IEEE Transactions on Signal Processing, vol. 54, no. 12, pp. 4634–4643, 2006. [27] Y. C. Eldar and H. Rauhut, “Average case analysis of multichannel sparse recovery using convex relaxation,” IEEE Transactions on Information Theory, vol. 56, no. 1, pp. 505–519, 2010. [28] P. Kyritsi, D. C. Cox, R. A. Valenzuela, and P. W. Wolniansky, “Correlation analysis based on MIMO channel measurements in an indoor environment,” IEEE Journal on Selected areas in communications, vol. 21, no. 5, pp. 713–720, 2003. [29] F. Kaltenberger, D. Gesbert, R. Knopp, and M. Kountouris, “Correlation and capacity of measured multi-user MIMO channels,” in Personal, Indoor and Mobile Radio Communications, 2008. PIMRC 2008. IEEE 19th International Symposium on.
IEEE, 2008, pp. 1–5.
[30] J. Hoydis, C. Hoek, T. Wild, and S. ten Brink, “Channel measurements for large antenna arrays,” in Wireless Communication Systems (ISWCS), 2012 International Symposium on.
IEEE, 2012, pp. 811–815.
[31] X. Gao, O. Edfors, F. Rusek, and F. Tufvesson, “Linear pre-coding performance in measured very-large MIMO
28
channels,” in Vehicular Technology Conference (VTC Fall), 2011 IEEE.
IEEE, 2011, pp. 1–5.
[32] “ETSI TS 136 101 V14.3.0 (2017-04) - LTE; Evolved Universal Terrestrial Radio Access (E-UTRA); User Equipment (UE) radio transmission and reception (3GPP TS 36.101 version 14.5.0 Release 14).” [33] K. Hugl, K. Kalliola, and J. Laurila, “Spatial reciprocity of uplink and downlink radio channels in FDD systems,” Proc. COST 273 Technical Document TD (02), vol. 66, p. 7, 2002. [34] A. Ali, N. Gonz´alez-Prelcic, and R. W. Heath Jr, “Millimeter wave beam-selection using out-of-band spatial information,” arXiv preprint arXiv:1702.08574, 2017. [35] H. Xie, F. Gao, S. Jin, J. Fang, and Y.-C. Liang, “Channel estimation for TDD/FDD massive MIMO systems with channel covariance computing,” arXiv preprint arXiv:1710.00704, 2017. [36] J. Hoydis, S. Ten Brink, and M. Debbah, “Massive MIMO in the ul/dl of cellular networks: How many antennas do we need?” IEEE J. on Sel. Areas on Commun. (JSAC), vol. 31, no. 2, pp. 160–171, 2013. [37] S. Haghighatshoar, M. B. Khalilsarai, and G. Caire, “Multi-band covariance interpolation with applications in massive MIMO,” arXiv preprint arXiv:1801.03714, 2018. [38] L. Liu, C. Oestges, J. Poutanen, K. Haneda, P. Vainikainen, F. Quitin, F. Tufvesson, and P. De Doncker, “The COST 2100 MIMO channel model,” IEEE Wireless Communications, vol. 19, no. 6, pp. 92–99, 2012. [39] L. Zheng and D. N. C. Tse, “Communication on the Grassmann manifold: A geometric approach to the noncoherent multiple-antenna channel,” IEEE Transactions on Information Theory, vol. 48, no. 2, pp. 359– 383, 2002. [40] T. T. Cai, Z. Ren, and H. H. Zhou, “Optimal rates of convergence for estimating toeplitz covariance matrices,” Probability Theory and Related Fields, vol. 156, no. 1-2, pp. 101–143, 2013. [41] H. Xiao, W. B. Wu et al., “Covariance matrix estimation for stationary time series,” The Annals of Statistics, vol. 40, no. 1, pp. 466–493, 2012. [42] D. P. Bertsekas and A. Scientific, Convex optimization algorithms.
Athena Scientific Belmont, 2015.
[43] S. Haghighatshoar and G. Caire, “Channel vector subspace estimation from low-dimensional projections,” arXiv preprint arXiv:1509.07469, 2015. [44] D. Vasisht, S. Kumar, H. Rahul, and D. Katabi, “Eliminating channel feedback in next-generation cellular networks,” in Proceedings of the 2016 conference on ACM SIGCOMM 2016 Conference.
ACM, 2016, pp.
398–411. [45] H. Xie, F. Gao, S. Zhang, and S. Jin, “A unified transmission strategy for TDD/FDD massive MIMO systems with spatial basis expansion model,” IEEE Transactions on Vehicular Technology, vol. 66, no. 4, pp. 3170– 3184, 2017. [46] Z. Zhu and M. B. Wakin, “On the asymptotic equivalence of circulant and toeplitz matrices,” IEEE Transactions on Information Theory, vol. 63, no. 5, pp. 2975–2992, 2017. [47] A. G. Davoodi and S. A. Jafar, “Aligned image sets under channel uncertainty: Settling conjectures on the collapse of degrees of freedom under finite precision CSIT,” IEEE Transactions on Information Theory, vol. 62,
29
no. 10, pp. 5603–5618, 2016. [48] S. A. Goreinov, E. E. Tyrtyshnikov, and N. L. Zamarashkin, “A theory of pseudoskeleton approximations,” Linear algebra and its applications, vol. 261, no. 1-3, pp. 1–21, 1997. [49] W. T. Tutte, “The factorization of linear graphs,” Journal of the London Mathematical Society, vol. 1, no. 2, pp. 107–111, 1947. [50] G. Caire, “On the ergodic rate lower bounds with applications to massive MIMO,” IEEE Transactions on Wireless Communications, vol. PP, no. 99, pp. 1–1, 2018. [51] G. Dimic and N. D. Sidiropoulos, “On downlink beamforming with greedy user selection: performance analysis and a simple new algorithm,” IEEE Transactions on Signal processing, vol. 53, no. 10, pp. 3857–3868, 2005. [52] R. A. Horn and C. R. Johnson, Matrix analysis.
Cambridge university press, 1990.
[53] A. Schrijver, Combinatorial optimization: polyhedra and efficiency. 2003, vol. 24.
Springer Science & Business Media,