IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS (SUBMITTED)
Adaptive Chip Equalization for DS-CDMA Downlink with Receive Diversity Frederik Petr´e 1? Luc Deneire 3 1
Geert Leus 2‡
Marc Moonen 2
Marc Engels 1§
Interuniversity Micro-Electronics Center (IMEC) Wireless Systems (WISE) Kapeldreef 75, B-3001 Leuven, Belgium
E-mail : {frederik.petre,marc.engels}@imec.be 2
Katholieke Universiteit Leuven (KULeuven)
Department of Electrical Engineering (ESAT) Kasteelpark Arenberg 10, B-3001 Leuven, Belgium E-mail : {geert.leus,marc.moonen}@esat.kuleuven.ac.be 3
University of Nice I3S Laboratory
Route des Lucioles 2000, F-06903 Sophia-Antipolis Cedex, France E-mail :
[email protected]
Submission date : January 17, 2003
Topics of Interest : • Diversity Techniques and Equalization • Space-Time Processing • Signal Separation and Interference Rejection
?
Also Ph.D. student at KULeuven, E.E. Dept., ESAT/INSYS, supported by the I.W.T. (Belgium). Postdoctoral Fellow of the F.W.O.-Flanders (Belgium). § Also Professor at KULeuven, E.E. Dept., ESAT/INSYS
‡
January 16, 2003
DRAFT
1
Abstract To cope with the Multi-User Interference (MUI) in the downlink of a Direct-Sequence Code Division Multiple Access (DS-CDMA) system like UMTS, chip equalization suppresses the Inter-Chip Interference (ICI) and hence restores the orthogonality of the user signals. However, effective ICI suppression calls for multi-channel reception at the mobile through receive diversity. Antenna diversity usually requires large antenna spacings for maximum diversity gain, and is therefore seldom implemented. As we will show, polarization diversity solves this spacing problem, while keeping the same benefits as antenna diversity. Assuming perfect Channel State Information (CSI), the ideal Minimum Mean Squared Error (MMSE) multi-channel chip equalizer realizes a significant performance improvement compared to the conventional RAKE receiver. However, in practice, CSI needs to be acquired or updated before the actual equalization and detection can take place. To tackle this CSI acquisition/updating problem, we propose two methods for direct chip equalizer design. The first method, coined pilottrained, only exploits the presence of a code-multiplexed pilot. The second method, coined semi-blind, additionally exploits the knowledge of the unused spreading codes. For both methods, we devise Recursive Least Squares (RLS) as well as Least Mean Squares (LMS) type of adaptive algorithms. The different algorithms are compared both in terms of performance and complexity. Keywords DS-CDMA, downlink, chip equalization, polarization diversity, adaptive algorithms
I. I NTRODUCTION Wideband Direct-Sequence Code Division Multiple Access (DS-CDMA) has emerged as the predominant multiple access technique for 3G cellular systems because it increases capacity and facilitates network planning in a cellular network [1]. To some extent, these systems rely on the orthogonality of the spreading codes to separate the different user signals, assuming frequency-flat fading channels. However, for increasing chip rates, the underlying multi-path channels become more time dispersive, causing frequency-selective fading. Apart from multi-path diversity, frequency-selective multi-path channels also introduce Inter-Chip Interference (ICI) that destroys the orthogonality amongst users, and hence gives rise to so-called Multi-User Inteference (MUI). Traditionally, a Maximum Ratio Combining (MRC) RAKE receiver collects the inherent multi-path diversity that comes with increased chip rate [2]. Although the RAKE receiver maximizes the received Signal-to-Noise Ratio (SNR), MUI severely deteriorates its performance. To actively suppress the MUI and achieve significant performance improvements relative to the RAKE receiver, different linear and non-linear multi-user detectors exploit joint knowledge of code, timing and channel information [3]. Although well suited for uplink reception at the base station, their stringent requirements on processing power and a-priori information can hardly be justified for downlink reception at the mobile station [4]. Moreover, practical adaptive implementations of linear multi-user January 16, 2003
DRAFT
2
detectors require a constant code-correlation matrix for guaranteed convergence of the adaptive updating algorithm, and therefore call for symbol-periodic spreading (where the code period equals the symbol period) [5], [6], [7], [8]. In this paper, we focus on the DS-CDMA downlink, similar to the UMTS Terrestrial Radio Access (UTRA) Frequency Division Duplex (FDD) mode of the 3GPP standard [9], which exhibits four distinct features compared to its uplink counterpart. First, downlink transmission is characterized by symbolaperiodic spreading (the code period is much longer than the symbol period), where each user’s data symbol stream is spread by a periodic orthogonal Walsh-Hadamard spreading code that is user specific and subsequently scrambled by an aperiodic overlay scrambling code that is base station specific. Second, the chip sequences of the different users are multiplexed synchronously on the transmission channel. Third, the different user signals experience the same multi-path channel when propagating to the mobile station of interest. Therefore, the intra-cell MUI is essentially caused by the multi-path channel that destroys the orthogonality amongst the different user signals. Finally, the inter-cell MUI originates from a small number of interfering sources, namely the number of neighbouring base stations that are within range of the mobile station of interest. Hence, performing linear equalization at the chip-level to completely or partially restore the transmitted multi-user chip sequence, followed by a correlation with the desired user’s spreading code, allows to suppress both intra- and inter-cell MUI. The idea of chip equalization was first introduced in [10], with designs for ideal (assuming perfect Channel State Information (CSI)) Zero-Forcing (ZF) and Minimum Mean Squared Error (MMSE) burst mode chip equalizers for a hybrid TD/CDMA system. Ideal ZF and MMSE serial chip equalizers for DS-CDMA have later been introduced in [11], [12], [13], [14]. In [14] and [12], their performance for a single-antenna mobile station is analyzed in a single-cell and a multi-cell context, respectively. In [11] and [13], this performance is improved by enabling antenna diversity at the mobile in a multi-cell context and under soft handover conditions, respectively. In this paper, we develop a general framework for multi-channel reception (receive diversity) at the mobile through a combination of antenna diversity, polarization diversity [15] and temporal oversampling. On the one hand, temporal oversampling generally gives rise to dependent subchannels, resulting in a bad performance. On the other hand, antenna diversity requires large antenna spacings to achieve maximum diversity gain, and is seldom implemented. As we will show, polarization diversity solves this spacing problem by relying on co-located antennas, while keeping the same benefits as antenna diversity for realistic values of the relevant channel parameters. Ideal MMSE chip equalizers that assume perfect CSI significantly outperform the classical RAKE reJanuary 16, 2003
DRAFT
3
ceiver. In practice, CSI needs to be acquired or updated before the actual equalization and detection can take place. Moreover, imperfect CSI can significantly degrade the performance of MMSE chip equalization [16]. Channel estimation can be performed based on either a time-multiplexed [11], [17], [18] or a code-multiplexed pilot [19]. The channel can also be identified blindly through either a subspace method [20] or a cross-relation property [21]. However, the channel estimation approach leads to a high computational burden in a mobile system setup where the time-varying nature of the multi-path channel requires frequent recalculation of the equalizer coefficients. To partially solve this complexity problem, the MMSE burst mode chip equalizer can be implemented in two steps, namely a premultiplication step with the inverse of the data covariance matrix followed by a classical RAKE combining step [22]. Besides an adaptive algorithm to update the inverse of the data covariance matrix in the premultiplication step, this approach also requires an adaptive channel estimator in the RAKE combining step. The adaptive algorithm to update the inverse of the data covariance matrix can be further simplified by exploiting the band diagonal structure of the data covariance matrix, resulting from the finite channel length [23]. An alternative adaptive implementation of the MMSE chip equalizer that also requires channel knowledge is related to Griffith’s algorithm [24]. To tackle the CSI acquisition/updating problem differently, we propose two methods for direct chip equalizer design that circumvent the need for explicit channel estimation. The first method, coined pilot-trained, only exploits the presence of a code-multiplexed pilot. The use of a code-multiplexed pilot for direct chip equalizer design was first suggested but not pursued in [25]. The second method, coined semi-blind, additionally exploits the knowledge of the unused spreading codes. For both methods, we devise Recursive Least Squares (RLS) as well as Least Mean Squares (LMS) type of adaptive algorithms. We investigate their performance in a single-cell as well as a multi-cell context. Parts of this work were presented in our earlier publications [26], [27], [28]. The paper is organized as follows. Section II lays out the system model for the downlink of a multi-cell DS-CDMA system including receive diversity at the mobile. Section III reviews ideal ZF and MMSE multi-channel chip equalizers for intra-cell and inter-cell MUI suppression. Section IV develops our practical chip equalizer design methods, while Section V compares their performance under different scenarios. Finally, Section VI summarizes our results and formulates the main conclusions. Notation : We use roman letters to represent scalars, lower boldface letters to denote row vectors, and upper boldface letters to denote matrices. (·)∗ , (·)T , and (·)H represent conjugate, transpose, and Hermitian, respectively. Further, | · | and k · k represent the absolute value and Frobenius norm, respectively. We reserve E{·} for expectation, b·c for integer flooring, d·e for integer ceiling, and ? for linear convolution. January 16, 2003
DRAFT
4
Subscripts nr and np point to the nr -th receive antenna and the np -th polarization, respectively. Subscripts m, p, and d point to the m-th user, the pilot, and the data, respectively. Superscript j refers to the j-th base station. Arguments i and n denote symbol and chip indices, respectively. II. S YSTEM
MODEL
We consider the downlink of a multi-cell DS-CDMA system that consists of several base stations (BS), each serving the different active mobile stations (MS) within its cell area. Since our main focus is a feasibility study of receive diversity at the MS, we assume that the BSs are equipped with a single transmit antenna. It is possible, however, to extend the proposed concepts to multi-antenna BSs, to improve the performance through space-time coding [29], [30], on the one hand, or to increase the spectral efficiency through spatial multiplexing [31], on the other hand. As shown in Figure 1 for the j-th BS, each BS transmits a synchronous code division multiplex, employing short orthogonal Walsh-Hadamard spreading codes that are MS-specific and a long overlay scrambling code that is BS-specific, similar to the UTRA FDD mode of the 3GPP standard [9]. With N the spreading factor (and equivalently the number of chips per symbol), the multi-user chip vector xj [i] := [xj [iN ], . . ., xj [(i + 1)N − 1]], transmitted by the j-th BS at the i-th symbol instant, sums the M j active user chip vectors and a continuous pilot chip vector : xj [i] := sjd [i] · Ajd · Cjd [i] + sjp [i] · Ajp · cjp [i],
(1)
where the 1×M j multi-user data symbol vector sjd [i] := sj1 [i], . . ., sjM j [i] stacks the data symbols of the different active users, the diagonal amplitude matrix Ajd := diag Aj1 , . . ., AjM j controls the transmit power of the different active users (downlink power control), and the M j ×N active multi-user code matrix h i T T T Cjd [i] := cj1 [i] , . . ., cjM j [i] stacks the composite code vectors of the different active users. The m-th
user’s composite code vector cjm [i] := [cjm [iN ], . . ., cjm [(i + 1)N − 1]] is the componentwise multiplica-
tion of the periodic Walsh-Hadamard spreading code ´cm := [´ cm [0], . . ., c´m [N − 1]] that is user-specific and the aperiodic overlay scrambling code cjs [i] := [cjs [iN ], . . ., cjs [(i + 1)N − 1]] that is BS-specific. The pilot symbol sjp [i], the pilot amplitude Ajp , and the pilot composite code vector cjp [i] are similarly defined as sjm [i], Ajm , and cjm [i], respectively. Due to the orthonormality of the user and pilot composite code vectors H
H
at each symbol instant, we have that Cjd [i]·Cjd [i] = IMj and Cjd [i]·cjp [i] = 0M j ×1 . For convenience, h iT ¯ j × N inactive multi-user code matrix C ¯ j [i] := ¯cj [i]T , . . ., ¯cj¯ j [i]T that stacks we also define the M d j ¯cm¯ [i] :=
1
M
¯ j := N − M j − 1 unused composite code vectors ¯ j }. It the M with m ¯ ∈ {1, . . ., M ¯ j [i] together form an orthonormal basis of the N -dimensional is important to note that cj [i], Cj [i], and C p
January 16, 2003
d
cjM j +m¯ [i],
d
DRAFT
5
code space. The pilot symbols, which are a-priori known to the receiver, can be exploited to estimate the complex channel coefficients for application in, e.g., a Maximum Ratio Combining (MRC) RAKE receiver [19], [32]. Alternatively, as will be explained in Section IV, they can also be used to directly design the coefficients of a chip equalizer. Finally, the multi-user chip sequence x j [n] is pulse-shaped to P j the corresponding continuous-time signal xj (t) := ∞ n=−∞ x [n]ψ(t − nTc ), where Tc is the chip period and ψ(t) is the transmit filter.
At the MS, we enable multi-channel reception (receive diversity) through a combination of N r -fold antenna diversity, Np -fold polarization diversity and Ns -fold temporal oversampling. The use of multiple receive antennas at the MS to boost downlink DS-CDMA performance has also been considered in [11], [13]. However, antenna diversity requires large antenna spacings to create independent subchannels, and is seldom implemented. Alternatively, polarization diversity solves this spacing problem by using colocated antennas, while keeping the same benefits as antenna diversity, as we will show. Polarization diversity deploys a dual-polarized antenna (consisting of two co-located antennas) that captures signals along orthogonal polarizations [15]. It relies on the ability of the propagation environment between the BS and the MS to depolarize and decorrelate the transmitted signal and hence to couple part of the transmitted signal’s energy into its orthogonal polarization. Upon combining both N r -fold antenna and Np -fold polarization diversity, the multi-path propagation channel between the j-th BS and the n p -th polarization of the nr -th receive antenna can be represented by : j
hjnr ,np (t)
=
R X
βnj r ,np ,r δ(t − τrj ),
(2)
r=0
where Rj + 1 is the number of propagation paths, βnj r ,np ,r and τrj are the complex gain and the delay of the r-th path, respectively. It is convenient to fix τ0j = 0 and to express each remaining delay as an excess delay relative to the 0-th delay. The performance of polarization diversity mainly depends on two parameters, namely the Cross Polarization Discrimination and the Cross Correlation Coefficient [33]. On the one hand, the Cross Polarization Discrimination (XPD) is the ratio of the average received signal power in one polarization versus the average received signal power in the orthogonal polarization. It measures the degree of depolarization that takes place between the transmitted and received signal : P Rj 2 j r=0 E{|βnr ,2,r | } j XP Dnr := PRj 2 . j E{|β | nr ,1,r } r=0
(3)
The optimal value of the XPD that achieves the best diversity performance is XP D jnr = 1 (or equivalently XP Djnr = 0 dB), representing equal average received signal power on both orthogonal polarizations. On January 16, 2003
DRAFT
6
the other hand, the Cross Correlation Coefficient (XCC) measures how well the received signals at the different polarizations are decorrelated : XCC jnr
E{βnj r ,1,r βnj r∗,2,r } q := q . 2 2 E{|βnj r ,1,r | } E{|βnj r ,2,r | }
(4)
In order to achieve the best diversity performance, the value of the XCC should preferably be XCC jnr = 0, denoting completely uncorrelated received signals on the orthogonal polarizations. The continuous-time received signal at the np -th polarization of the nr -th receive antenna can now be expressed as follows : ynr ,np (t) =
J X
hjnr ,np (t − τ j ) ? xj (t) + enr ,np (t),
(5)
j=1
where J is the number of active adjacent base stations, and τ j is the propagation delay from the j-th BS to the MS of interest. The additive noise enr ,np (t) accounts for the receiver thermal noise as well as the interference originating from distant base stations. The Additive White Gaussian Noise (AWGN) assumption for enr ,np (t) is well justified due to the interference averaging effect of DS-CDMA. The model in (5) is general enough to capture all important working modes of the MS. Whenever the MS is close to the center of the cell, a single adjacent BS (J = 1) will be important. However, in case the MS approaches the edge of the cell, at least two adjacent base stations (J≥2) will become important. Each received signal ynr ,np (t), with nr ∈ {1, . . ., Nr } and np ∈ {1, . . ., Np }, is subsequently filtered by ¯ − τ j )| ¯ that is matched to the transmit filter ψ(t). Let hj ? ψ)(t [l] := (ψ ? hj the receive filter ψ(t) nr ,np
nr ,np
t=lTc
be the chip-sampled equivalent discrete-time FIR channel from the j-th BS to the n p -th polarization of the nr -th receive antenna. The FIR channel hjnr ,np [l] models the multi-path propagation channel including the effect of the transmit and receive filters as well as the propagation delay. The chip-sampled discrete-time ¯ is the superposition of a channel-distorted version of the received signal yn ,n [n] := (yn ,n ? ψ)(t)| r
p
r
p
t=nTc
signals from the J adjacent base stations, and can be written as : ynr ,np [n] =
J X
j
hjnr ,np [n]
j
? x [n] + enr ,np [n] =
j=1
j
J δX +L X
hjnr ,np [l]xj [n − l] + enr ,np [n],
(6)
j=1 l=δ j
where δ j and Lj are the delay index and the order of the FIR channel hjnr ,np [l], respectively, and enr ,np [n] := ¯ denotes the additive gaussian noise which we assume to be white with variance (enr ,np ? ψ)(t)|
t=nTc j 2 σe . The delay index δ j can be written as δ j = d τTc e. The channel j τ j +τmax +2Tf ilter j d e − δ j , where τmax is the maximum excess delay of Tc
order Lj can be derived as Lj = the propagation channel hjnr ,np (t),
and Tf ilter is the length of the transmit/receive filter. January 16, 2003
DRAFT
7
In the above discussion, we have considered chip rate sampling at the receiver side. Since in practice the transmit and receive filters are Root Raised Cosine (RRC) filters with some excess bandwidth relative to
1 , Tc
we could actually sample at
Ns , Tc
where Ns is called the oversampling factor. As opposed to
ideal antenna diversity that creates perfectly independent channels and polarization diversity that creates approximately independent channels, temporal oversampling generally gives rise to dependent channels. Although this results in a different performance, the three techniques are completely equivalent from a signal processing point of view. In summary, we enable multi-channel reception at the MS through either N r -fold antenna diversity, Np -fold polarization diversity, Ns -fold temporal oversampling or a combination of these three techniques. With Nc := Nr Np Ns the total number of equivalent subchannels, we can stack the discrete-time received signals for each of these subchannels in y[n] := [y1 [n], . . ., yNc [n]], which we can write as : j
y[n] =
j
J δX +L X
hj [l]xj [n − l] + e[n],
(7)
j=1 l=δ j
where hj [l] := hj1 [l], . . ., hjNc [l] is the discrete-time vector channel from the j-th BS to the Nc subchan-
nels, and e[n] := [e1 [n], . . ., eNc [n]] is the received noise vector sequence. Note that we model hj [l] as
an 1×Nc FIR vector filter of order Lj with delay index δ j (hj [l] 6= 0, for l = δ j and l = δ j + Lj , and hj [l] = 0, for l < δ j and l > δ j + Lj ). III. I DEAL
CHIP EQUALIZATION
An ideal chip equalizer receiver, shown in Figure 2, consists of a linear multi-channel equalizer that operates at the chip-level, followed by a correlator. In case we are interested in the transmission of the j-th BS, the multi-channel chip equalizer tries to restore the corresponding multi-user chip sequence x j [i], by c linearly combining the discrete-time signals from the different subchannels {y nc [n]}N nc =1 at the chip-level.
ˆ j [i], we can recover the m-th Once we have an estimate of the transmitted multi-user chip sequence x user’s data symbol stream by a simple despreading operation : H
ˆ j [i] · cjm [i] , sˆjm [i] = (1/Ajm ) · x
(8)
for which we rely on the restored orthonormality of the composite code vectors at each symbol instant. Q
c denoting the Qc -th order chip equalizer for the j-th base station and the nc -th subWith {gnj c [q]}q=0 P c j channel, the estimated multi-user chip sequence is : xˆj [n − D] = N nc =1 gnc [n] ? ync [n], where D stands
for the chip equalizer delay. Aiming at a matrix representation for the chip equalization process, it is January 16, 2003
DRAFT
8
convenient to make the underlying convolution operation explicit for N consecutive chip instants that correspond to the i-th symbol interval. Therefore, we introduce the following (Q c + 1)Nc × N output matrix :
Ya [i] =
y[iN + a] .. .
T
...
y[(i + 1)N + a − 1] .. .
T
y[iN + a + Qc ]T . . . y[(i + 1)N + a − 1 + Qc ]T
,
(9)
where a is the processing delay and Qc the chip equalizer order. Let us also define the multi-channel chip j equalizer vector gj := [gj [Qc ], . . ., gj [0]], where gj [q] := [g1j [q], . . ., gN [q]] collects the chip equalizer c
coefficients of the different subchannels at the q-th delay. The estimate of the multi-user chip vector x j [i] can then be expressed as : ˆ j [i] = gj · Ya [i]. x
(10)
Clearly, there exists a one-to-one relationship between the processing delay a and the equalizer delay D, namely D = a + Qc . The output matrix in (9) can be written as : Ya [i] =
J X
Hj · Xja [i] + Ea [i],
(11)
j=1
where the noise matrix Ea [i] is similarly defined as Ya [i], and the j-th base-station’s (Qc + 1)Nc × r j (r j = Lj + 1 + Qc ) channel matrix Hj with block Toeplitz structure is given by : T j j j T ... hj [δ j ] 0T ... 0T h [δ + L ] T T 0T hj [δ j + Lj ] ... hj [δ j ] ... 0T j H = .. .. . . T T 0T ... 0T hj [δ j + Lj ] . . . hj [δ j ]
.
(12)
T
The r j × N input matrix Xja [i] is defined as : Xja [i] := [xja−δj −Lj [i]T , . . ., xja−δj +Qc [i]T ] , where xja [i] := [xj [iN + a], . . ., xj [(i + 1)N + a − 1]] is the multi-user chip vector, transmitted by the j-th BS, starting at delay a. By definition, the desired multi-user chip vector in (1) is the one at delay a = 0 : x j [i] := xj0 [i]. Provided the processing delay a stays within the range δ j − Qc ≤ a ≤ δ j + Lj , xj [i] is the (∆j + 1)-st row of the input matrix Xja [i], with ∆j := Lj + δ j − a. Thus, xj [i] = i∆j · Xja [i], where i∆j denotes the 1 × r j unit vector with one on its (∆j + 1)-st entry. Besides the knowledge of a lowerbound on the channel order Lj , e.g. Ljmin = 0, the previous bound on the processing delay a also requires knowledge of an upper- and lowerbound on the delay index δ j . The latter corresponds to coarse timing synchronization at the receiver. January 16, 2003
DRAFT
9
For convenience, we rewrite (11) as follows : j
Ya [i] = H ·
Xja [i]
J X
+
Hk · Xka [i] + Ea [i],
(13)
k=1,k6=j
|
{z
´ a [i] E
}
´ a [i] accounts for both the white noise and the non-white inter-cell where the equivalent noise matrix E interference. Finally, (11) can also be written as : Ya [i] = H · Xa [i] + Ea [i],
(14) T
T T
where H := [H1 , . . ., HJ ] is the (Qc +1)Nc ×r compound channel matrix, Xa [i] := [X1a [i] , . . ., XJa [i] ] P the r ×N compound input matrix, and r = Jj=1 r j is called the system order. Note that xj [i] = ij ·Xa [i], Pj−1 k where j := k=1 r + ∆j . Armed with a suitable data model, we can now proceed with the design of different chip equalizers.
Since Maximum Likelihood (ML) Viterbi equalization at the chip-level is not feasable from a complexity point of view, we will only focus on low-complexity linear alternatives. A. ZF chip equalization A first possibility is to apply a Zero-Forcing (ZF) chip equalizer that completely eliminates the ICI and, hence, completely restores the multi-user chip vector xj [i] in the absence of noise. Therefore, the ZF chip j j equalizer should satisfy gZF · Ya [i] = xj [i] in the absence of noise, or equivalently gZF · H = ij . The
latter suggests the compound channel matrix H should have full column rank r. This requires H to be tall : (Qc + 1)Nc ≥ r, which is equivalent with : (Qc + 1)(Nc − J) ≥
J X
Lj .
(15)
j=1
Therefore, effectively dealing with J adjacent base stations, necessitates multi-channel reception at the MS through at least Nc ≥J + 1 subchannels. E.g., upon combining Np = 2-fold polarization diversity and Ns = 2-fold temporal oversampling, we obtain Nc = Nr Np Ns = 4 (good for dealing with J = 3 adjacent base stations) with only Nr = 1 (dual-polarized) antenna at the MS. In general, (15) is necessary but not sufficient for H to have full column rank. The sufficient condition for H to have full column rank, and thus for ZF identifiability, is threefold [34]. The corresponding Multiple Input Multiple January 16, 2003
DRAFT
10
Output (MIMO) transfer function of (14), with J inputs and Nc outputs, should be column-reduced and irreducible. Moreover, the chip equalizer order should satisfy the following dimension condition : (Qc + 1) ≥
J X
Lj ,
(16)
j=1
which is more strict than (15). Note that (15) reduces to (16) for Nc = J + 1. Also, the conditions in (15) and (16) only require knowledge of an upperbound on the channel order L j . Such an upperbound only depends on the propagation environment, including the effect of transmit and receive filters, and can therefore be determined at design time, e.g. Ljmax = d
j τmax +2Tf ilter e. Tc
Since H is not square, there exist infinitely many ZF chip equalizers for any well-conditioned channel matrix H. The optimal choice is governed by selecting the ZF chip equalizer that causes minimal noise enhancement : n
o j
gj · Ea [i] 2 , gZF = arg min E j g
subject to gj · H = ij .
(17)
By solving the constrained optimization problem in (17) with the Lagrangian method [13], we obtain the minimum-norm ZF chip equalizer : j gZF = ij · HH · R−1 E ·H
−1
· HH · R−1 E ,
(18)
where the noise covariance matrix is given by : n o RE := E Ea [i] · Ea [i]H = N · σe2 · I(Qc +1)Nc .
(19)
Although it served the purpose of simplifying our presentation, the condition for H to have full column rank is not really necessary. Indeed, even if H looses rank for some ill-conditioned channels, e.g., channels with a head or tail approaching zero, it is still very likely that the desired row x j [i] := xj0 [i] belongs to the row space of Ya [i]. Based on this observation, we may conclude there are no problems related to the channel rank r. Similar conclusions were drawn for related blind methods in [35]. In order to support this claim, the performance of the different chip equalizers is simulated for a realistic multi-path channel with an exponential power/delay profile in Section V. B. MMSE chip equalization A second possibility is to apply a Minimum Mean Squared Error (MMSE) chip equalizer that only partially eliminates the ICI and, hence, only partially restores the multi-user chip vector x j [i] in the January 16, 2003
DRAFT
11
ˆ j [i] absence of noise. It minimizes the MSE between the multi-user chip vector x j [i] and its estimate x in (10) : n
o j
gj · Ya [i] − xj [i] 2 . gM = arg min E M SE
(20)
gj
By solving the unconstrained optimization problem in (20), we obtain the MMSE chip equalizer : j H −1 −1 gM M SE = ij · H · RE · H + RX
−1
· HH · R−1 E ,
(21)
where RX := E{Xa [i] · Xa [i]H } is the input covariance matrix. From (18) and (21) it is also clear that j j gM M SE reduces to gZF at high SNR.
Starting from (13), we can derive an alternative formulation for the MMSE chip equalizer : h i−1 H j j H j −1 −1 j gM = i · H · R · H + R · Hj · R−1 ∆ M SE ´ ´ , Xj E E
(22)
where the equivalent noise covariance matrix RE´ is given by : n
´ a [i] · E ´ a [i]H RE´ := E E
o
=
J X
H k · RX k · H k
k=1,k6=j
H
+ RE ,
(23)
and the j-th input covariance matrix RX j is expressed by : H
RX j := E{Xja [i] · Xja [i] } = N · σx2j · Irj . Note that σx2j = (
PM j
m=1
2
(24)
2
(Ajm ) + (Ajp ) )σs2 /N is the total transmit power of xj [n], where σs2 is the average
j power of the data symbols. From (22), we can conclude that gM M SE suppresses both inter-cell inter-
ference through noise whitening and intra-cell interference through ICI elimination. At the same time, j gM M SE finds an optimal balance between interference suppression on the one hand and noise enhance-
ment on the other hand. IV. P RACTICAL
DIRECT CHIP EQUALIZER DESIGN
Up till now, we have assumed perfect Channel State Information (CSI) at the receiver to calculate either ZF (see (18)) or MMSE (see (21)) type of chip equalizers. However, in practice the receiver needs to acquire CSI based on, e.g., training sequences. In this section, we will discuss two direct chip equalizer design algorithms, that directly determine the equalizer coefficients without having to estimate the channel first. In this perspective, Least Squares (LS) algorithms for burst processing serve well as a starting point to derive either Recursive Least Squares (RLS) or Least Mean Squares (LMS) algorithms for adaptive processing. January 16, 2003
DRAFT
12
Starting from (14) and assuming the channel matrix H to be well conditioned, the multi-channel chip equalizer gj , for which gj ·Ya [i]−xj [i] = 0 in the absence of noise, is the ZF chip equalizer given by (18). In the presence of noise, we convert the MMSE minimization problem of (20) into its corresponding Least Squares (LS) minimization problem, which we denote for convenience as : j gLS = arg min j g
I−1 X
j
g · Ya [i] − xj [i] 2 ,
(25)
i=0
where we consider I consecutive symbol instants i ∈ {1, . . ., I}, corresponding to a burst length of I data symbols per user. Two distinct approaches exist to solve this minimization problem. The first approach constrains the chip equalizer vector g j to be fixed over the entire burst length (under the assumption that the underlying multi-path channels are static and hence the channel matrix H remains constant as well) and leads to an LS type of burst processing algorithm. The second approach updates the chip equalizer vector gj [i] from one symbol instant to the next and leads to either an RLS or an LMS type of adaptive processing algorithm that is also suited for time-varying multi-path channels. Substituting (1) into (25), leads to : j gLS
I−1 X
j
g · Ya [i] − sj [i] · Aj · Cj [i] − sjp [i] · Ajp · cjp [i] 2 , = arg min d d d j g
(26)
i=0
which is an LS problem in both the chip equalizer vector g j and the multi-user data symbol vectors I−1 . Taking (26) as a starting point, we will develop the following two direct chip equalizer design {sjd [i]}i=0
methods, that differ in the amount of a-priori information that they exploit to determine the equalizer coefficients. The first method, coined pilot-trained, only exploits the presence of the known pilot symbols. The second method, coined semi-blind, additionally exploits knowledge of the unused composite code vectors. A. Pilot-trained chip equalizer I−1 The pilot-trained chip equalizer is calculated from the output matrices {Y a [i]}i=0 based on the knowlI−1
I−1
edge of the pilot composite code vectors {cjp [i]}i=0 and the pilot symbols {sjp [i]}i=0 . It is applicable as such to the UTRA FDD mode of the 3GPP standard. Indeed, the Common PIlot CHannel (CPICH), present in the UTRA FDD mode, provides a continuous code-multiplexed pilot [9].
January 16, 2003
DRAFT
13
A.1 LS burst processing By despreading (26) with the pilot composite code vector cjp [i] and by relying on the orthonormality of the user and the pilot composite code vectors at each symbol instant, we obtain : j gLS = arg min gj
I−1
2 X
j j H j j g · Y [i] · c [i] − s [i] · A
a p p p ,
(27)
i=0
which can be interpreted as follows. The equalized output matrix g j · Ya [i] is first despread with the pilot composite code vector cjp [i]. The resulting equalized output matrix after despreading g j · Ya [i] · cjp [i]
H
should then be as close as possible in a Least Squares sense to the transmitted pilot symbol s jp [i] · Ajp . However, the order of chip equalization and despreading can also be reversed. So, the output matrix Y a [i] is first despread with the pilot composite code vector cjp [i], leading to : j gLS = arg min gj
I−1
2 X
j j H j ¯ g · y [i] − s [i]
p p ,
(28)
i=0
¯ pj [i] := (1/Ajp ) · cjp [i] · Ya [i]H collects the where the 1 × (Qc + 1)Nc pilot code despread output vector y correlation values of the Nc received signals with the pilot composite code vector at Qc + 1 different time instants. A.2 SRI-RLS adaptive processing Starting from the corresponding LS algorithm for burst processing in (28), it is possible to derive a numerically stable Square Root Information (SRI) RLS algorithm for adaptive processing. Such an algorithm consists of a QR Decomposition (QRD) updating step and a triangular backsubstitution step [36]. The QRD-updating step keeps track of a (Qc + 1)Nc × (Qc + 1)Nc lower-triangular factor Rj [i] and a corresponding 1 × (Qc + 1)Nc right-hand side zj [i]. These are updated with the new pilot code despread ¯ pj [i] and the new pilot symbol sjp [i], respectively : output vector y H ¯ pj [i] λRj [i − 1] y Rj [i] 0(Qc +1)Nc ×1 ← · Tj [i], zj [i] ? λzj [i − 1] sjp [i]
(29)
where λ is the forget factor, ? is a don’t care value, and Tj [i] represents the orthogonal transformation matrix at the i-th symbol instant that triangularizes the structured compound matrix, consisting of a triangular matrix plus one extra column. This transformation matrix can be factorized into a sequence of Givens transformations [37]. The forget factor λ, corresponding to a data memory of January 16, 2003
1 , 1−λ
discounts less DRAFT
14
reliable past data and should be chosen in correspondence with the normalized coherence time of the underlying fading multi-path channels. Hence, for static multi-path channels, the forget factor should be λ = 1 and the SRI-RLS adaptive processing algorithm achieves exactly the same performance as its j corresponding LS burst processing algorithm. The new value of the chip equalizer vector g RLS [i] finally
follows from the triangular backsubstitution step : j gRLS [i] · Rj [i] = zj [i].
(30)
A.3 NLMS adaptive processing By taking the instantaneous gradient of the LS cost function in (28), we naturally arrive at a computationally more efficient Normalized Least Mean Squares (NLMS) algorithm for adaptive processing : j j gLM S [i] = gLM S [i − 1] −
µ ¯ j j H j ¯ · g [i − 1] · y [i] − s [i] · ypj [i], p p LM S Epj [i]
(31) H
¯ pj [i] · y ¯ pj [i] and can vary where the stepsize µ ¯ is normalized with respect to the input energy E pj [i] := y j between 0 and 1. From (31), it is also clear that the chip equalizer vector g LM S [i − 1] is updated in the
direction of the input vector ypj [i] with an amount that is proportional to both the normalized stepsize µ ¯ H
j ¯ pj [i] − sjp [i]). and the error signal (gLM S [i − 1] · y
Note that the SRI-RLS and the NLMS adaptive algorithms operate at the symbol rate after despreading, and thus benefit from the processing gain. Unlike our algorithms, the adaptive algorithm of [38] operates at the chip rate before despreading, and possibly suffers from slow convergence, especially under low SNR conditions. B. Semi-blind chip equalizer Compared to the pilot-trained chip equalizer, the semi-blind chip equalizer additionally exploits knowlI−1
edge of the active multi-user code matrices {Cjd [i]}i=0 or, equivalently, the inactive multi-user code ma¯ j [i]}I−1 . Unlike the pilot-trained chip equalizer, the semi-blind chip equalizer is not applicable trices {C d
i=0
as such to the UTRA FDD mode of the 3GPP standard. Indeed, the set of active/inactive spreading codes is usually only known at the BS and not at the MS. Estimating the set of active/inactive spreading codes at the MS could be one option to solve this problem. However, this would unnecessarily further increase the complexity of the MS, on the one hand, and probably lead to some estimation errors, on the other hand. Slightly adapting the control signalling of the standard could be a more viable option. Indeed, the information about the set of active/inactive spreading codes is cell specific, and thus common to all MSs. January 16, 2003
DRAFT
15
Hence, it could be broadcasted on a Common Control CHannel (CCCH) that can be received by all active MSs, e.g., the Broadcast CHannel (BCH) [9]. The incurred signalling overhead is rather limited, since only the “labels” of the active/inactive spreading codes should be communicated over the BCH. The actual generation of these codes can still take place at the MS. A detailed study of the impact of this issue on the 3GPP standard goes beyond the scope of this paper. In the following, we assume perfect knowledge of the set of active/inactive spreading codes at the MS. B.1 LS burst processing By despreading (26) with the active multi-user code matrix Cjd [i] and by assuming the chip equalizer vector gj to be known and fixed, we obtain an LS estimate of the transmitted multi-user data symbol ˆ j = gj · Ya [i] · Cj [i]H . Substituting this expression for ˆsj [i] · A ˆ j into the original LS vector ˆsj [i] · A d
d
d
d
d
problem of (26) leads to a modified LS problem in the chip equalizer vector g j only : j gLS
I−1 X
j
g · Ya [i] · Pj [i] − sjp [i] · Ajp · cjp [i] 2 , = arg min gj
d
(32)
i=0
H
where Pjd [i] := (IN − Cjd [i] · Cjd [i]) is the orthogonal projection matrix corresponding to Cjd [i]. The interpretation of (32) goes as follows. The equalized output matrix g j · Ya [i] is first projected on the orthogonal complement of the subspace spanned by the active multi-user code matrix C jd [i], employing the orthogonal projection matrix Pjd [i]. The resulting equalized output matrix after projecting, i.e., g j · Ya [i] · Pjd [i], should then be as close as possible in a Least Squares sense to the transmitted pilot chip vector sjp [i] · Ajp · cjp [i]. With some algebraic manipulations, it is easy to prove that (32) can be rewritten as : j gLS
I−1
2 X
2
j j j H j j j
= arg min ,
g · Ya [i] · cp [i] − sp [i] · Ap + g · Ya [i] · Pd,p [i] j g
(33)
i=0 H
H
where Pjd,p [i] := (IN − Cjd [i] · Cjd [i] − cjp [i] · cjp [i]) is the orthogonal projection matrix corresponding to both Cjd [i] and cjp [i]. This shows that the original LS problem of (32) naturally decouples into two different parts : a training-based part and a fully-blind part. On the one hand, the training-based part (described by the first term of (33)) corresponds to the pilot-trained chip equalizer of (27). On the other hand, the fullyblind part (described by the second term of (33)) projects the equalized output matrix g j · Ya [i] on the orthogonal complement of the subspace spanned by both the active multi-user code matrix C jd [i] and the pilot composite code vector cjp [i]. Furthermore, minimizing this equalized output matrix after projecting January 16, 2003
DRAFT
16
gj · Ya [i] · Pjd,p [i] in a Least Squares sense, actually corresponds to the Minimum Output Energy (MOE) criterion of [39], [40], [41]. This explains why the chip equalizer design method discussed above can be classified as a semi-blind method. Let us now turn our attention again to the orthogonal projection matrix P jd,p [i] that performs a projection on the orthogonal complement of the subspace spanned by the active multi-user code matrix C jd [i] as well as the pilot composite code vector cjp [i]. Due to the orthonormality of the composite code vectors at each symbol instant, this is equivalent to directly projecting on the subspace spanned by the inactive multi-user ¯ j [i]. Therefore, we are allowed to substitute Pj [i] by C ¯ j [i]H · C ¯ j [i] in (33), leading to : code matrix C d
j gLS
d,p
d
d
I−1
2
2 X
j
j j H j H j j ¯ = arg min
g · Ya [i] · cp [i] − sp [i] · Ap + g · Ya [i] · Cd [i] , j g
(34)
i=0
¯ j [i] has been left out because it does not alter the LS criterion. The where the right multiplication with C d second term in (34) expresses that the energy in the equalized output matrix after despreading with the ¯ j [i] should be minimal. This suggests that the inactive composite code inactive multi-user code matrix C d
vectors can be considered as virtual pilot composite code vectors that continuously carry zero virtual pilot symbols. Similar to the pilot-trained chip equalizer, the order of chip equalization and despreading can again ¯ j [i], be reversed. So, the output matrix Ya [i] is first despread with the inactive multi-user code matrix C d
leading to : j gLS
I−1
2 X
j ¯ j H 2
j j H j ¯ p [i] − sp [i] + g · Y [i] , = arg min
g · y j g
(35)
i=0
¯ j [i] := C ¯ j [i] · ¯ j × (Qc + 1)Nc virtual pilot code despread output matrix Y where the m-th ¯ row of the M d Ya [i]H collects the correlation values of the Nc received signals with the corresponding inactive composite j code vector c¯m ¯ [i] at Qc + 1 different time instants.
B.2 SRI-RLS adaptive processing Similar to the pilot-trained chip equalizer, an SRI-RLS adaptive version of the semi-blind chip equalizer can be obtained from its corresponding LS algorithm for burst processing in (35). Compared to (29), the ¯ j [i], besides QRD-updating step now also processes the new virtual pilot code despread output matrix Y ¯ pj [i] and the new pilot symbol sjp [i] : the new pilot code despread output vector y j j H j H j ¯ ¯ p [i] λR [i − 1] y Y [i] R [i] 0(Qc +1)Nc ×(1+M¯ j ) ← · Tj [i], j j j z [i] ?1×(1+M¯ j ) λz [i − 1] sp [i] 01×M¯ j January 16, 2003
(36)
DRAFT
17
where ?1×(1+M¯ j ) is a vector of don’t care values. The orthogonal transformation matrix T j [i] now tri¯ j ) extra angularizes a larger structured compound matrix, consisting of a triangular matrix plus (1 + M columns (instead of just one extra column in the pilot-trained case). The triangular backsubstitution step of (30) remains valid : j gRLS [i] · Rj [i] = zj [i].
(37)
B.3 NLMS adaptive processing Similar to the derivation for the NLMS adaptive version of the pilot-trained chip equalizer estimator in (31), we naturally obtain its semi-blind counterpart by taking the instantaneous gradient of its corresponding LS cost function in (35), which leads to the following updating algorithm : i µ ¯ h j j j j H j j j H ¯ ¯ ¯ p [i] − sp [i] · y ¯ p [i] + gLM S [i − 1] · Y [i] = · Y [i] , · gLM S [i − 1] · y E j [i] (38) ¯ j [i] · where the stepsize µ ¯ is now normalized with respect to the total input energy E j [i] := E j [i] + tr{Y j gLM S [i]
j gLM S [i−1]−
p
¯ j [i]H }. From (38), it is also clear that the chip equalizer vector g j [i − 1] is now updated in a diY LM S ¯ pj [i] and the virtual rection that is a linear combination of the real pilot code despread output vector y ¯j M j ¯ j [i]). The coefficients of this linear com(the rows of Y pilot code despread output vectors {¯ ym ¯ [i]} bination are the real pilot error signal j {(gLM S [i
− 1] ·
H j ¯m y ¯ [i]
¯j M
m=1 ¯ j (gLM S [i
H
¯ pj [i] − sjp [i]) and the virtual pilot error signals − 1] · y
− 0)}m=1 , respectively. ¯
C. Complexity comparison With T := (Qc + 1)Nc the total number of multi-channel chip equalizer taps, Table I evaluates the operation count of the different adaptive chip equalizers, in terms of multiply/accumulates. A multiply/accumulation is an operation of the type d = c + a.b that involves one complex multiplication, one complex addition, three complex reads and one complex write. Hence, the number of corresponding data transfers is four times the number of multiply/accumulates. Also, Table I distinguishes between the updating phase, where the new values of the chip equalizer coefficients are determined, and the detection phase, where these chip equalizer coefficients are used to detect the desired user’s data symbol. Both phases are executed continuously, at the symbol rate, where the latter is common to all adaptive chip equalizers. According to (29) and (30), the RLS pilot-trained updating consists of three computational steps per time update, namely first a pilot code despreading step, then a QRD-updating step and finally a triangular backsubstitution step. From (28), it follows that the pilot code despreading step only requires O (N T ) January 16, 2003
DRAFT
18
operations. On the other hand, both the QRD-updating and the triangular backsubstitution step require O (T 2 ) operations, leading to an overall complexity of O (T 2 ). According to (31), the LMS pilot-trained updating consists of four computational steps per time update : the pilot code despreading step, the input energy calculation step, the error signal calculation step and finally the actual updating step. Each of these steps requires O (N T ) operations, resulting in an overall complexity of O (N T ). Suggested by (36), the virtual pilot code despreading step and the QRD-updating step of the RLS semi-blind updating are ¯ j + 1) times more complex than their respective counterparts in the RLS pilot-trained updating. This (M ¯ j T 2 . A similar observation can be made for the LMS semi-blind leads to an overall complexity of O M ¯ j N T . For specific values of the complexity, updating in (38), resulting in an overall complexity of O M we refer to the next section.
V. S IMULATION
RESULTS
We consider the downlink of a wideband DS-CDMA system similar to the UTRA FDD mode of the 3GPP standard [42]. The system operates at a carrier frequency of F c = 2 GHz corresponding to a wavelength of λc =
c Fc
= 15 cm, with c the speed of light. Each base station transmits a multi-user
chip sequence at a chip rate of Rc =
1 Tc
= 3.84 M Hz corresponding to a chip period of Tc = 260.4 ns.
The multi-user chip sequence consists of M j active user signals and a continuous pilot signal. Each user’s QPSK modulated symbol sequence (nb = 2 bits per symbol) is spread by an orthogonal WalshHadamard spreading code of length N = 16 that is user specific, which results in a channel symbol rate of Rs =
Rc N
= 240 kHz. The code-multiplexed pilot continuously transmits the same QPSK pilot symbol
1 + j, that is spread by the all-ones Walsh-Hadamard code. The different chip sequences are summed synchronously and the resulting sum signal is subsequently scrambled by an aperiodic overlay code of length 38400 chips that is base station specific. Unless otherwise stated, we assume a single-cell context in which the intra-cell MUI is dominating the performance and the inter-cell MUI can be included in the AWGN term. The desired cell, which we assume to be labeled with j = 1, is half loaded with M 1 = 7 ¯ 1 := N − M 1 − 1 = 8 codes are inactive in the desired cell. The active users and the pilot. Hence, M active users are divided into 4 equal-power desired users and 3 equal-power interfering users. The pilot 2
1 energy per chip Ec,p := (A1p ) σs2 /N is 20 % of the total transmit power σx21 . The ratio of the desired user’s 2
1 1 /σx21 = −10 dB. := (A1d ) σs2 /N versus the total transmit power σx21 is fixed to Ec,d energy per chip Ec,d
The interfering user’s power is scaled accordingly. As specified in [42] as channel case 3, we assume that each channel has R 1 + 1 = 4 Rayleigh fading
January 16, 2003
DRAFT
19
paths according to the power/delay profile in Table II. For the transmit as well as the receive filter, we take a Root Raised Cosine (RRC) filter with a roll-off factor of 0.22 and an impulse response truncated to 5 chips on each side (Tf ilter = 10Tc ). Hence, the upperbound on the channel order L1 can be derived 1 as L1max = d(τmax + 2Tf ilter )/Tc e = 23. All channels are generated at 4 times the chip rate and then
downsampled to the chip rate. For the additional set of channels obtained by temporal oversampling (Ns = 2), an offset of
Tc 2
is introduced before downsampling. In the following, we simulate the average
Bit Error Rate (BER) versus the average received Signal-to-Noise Ratio (SNR) per bit. The performance is averaged across the desired users for 500 Monte Carlo channel realizations. We define the SNR per bit, SN R :=
1 Eb,d , σe2
2
1 as the ratio of the desired user’s energy per bit Eb,d := (A1d ) σs2 /nb versus the noise
variance σe2 . A. Oversampling diversity versus antenna diversity Figure 3 compares the performance of the ideal RAKE receiver and the ideal MMSE chip equalizer for four different configurations of multi-channel reception at the MS without polarization diversity (N p = 1) : single-antenna reception without oversampling (Nr = 1, Ns = 1), single-antenna reception with oversampling (Nr = 1, Ns = 2), dual-antenna reception without oversampling (Nr = 2, Ns = 1) and dual-antenna reception with oversampling (Nr = 2, Ns = 2). In order to satisfy (16), we choose the chip equalizer order Qc = L1max = 23. Serving as a lowerbound on the BER performance, the single-user bound corresponds to the theoretical BER-curve of QPSK with Nr (R1 + 1) = 8-th order diversity in Rayleigh fading channels [43]. The single-antenna RAKE receiver suffers from a BER floor at 5.10 −2 because it only maximizes the received signal power but does not cope with the Inter Chip Interference (ICI) and the associated Multi-User Interference (MUI) at high SNR. Unlike the RAKE receiver, the single-antenna MMSE chip equalizer suppresses these interferences and achieves a significant performance gain compared to the former : e.g., at a BER of 6.10−2 it realizes a 6 dB gain. However, the single-antenna MMSE chip equalizer does not completely suppress the ICI and MUI at high SNR since it does not fulfill the ZF requirement for multi-channel reception in case of a single base station, namely N c ≥2 (see (15) in Subsection III-A). Although conceptually very similar to antenna diversity, oversampling diversity does not provide any significant performance improvement, nor for the RAKE receiver nor for the MMSE chip equalizer, because the resulting subchannels are highly correlated due to the small roll-off factor. Unlike oversampling diversity, antenna diversity enables a significant performance improvement, both for the RAKE receiver and the MMSE chip equalizer. The RAKE receiver still suffers from a BER floor but one
January 16, 2003
DRAFT
20
that is much lower than in the single-antenna case, namely at 10 −2 . The dual-antenna MMSE chip equalizer exhibits a significant performance improvement compared to its single-antenna counterpart : e.g., a 5 dB gain at a BER of 10−3 . Note that on top of this diversity gain, there is an additional array gain of 10 · log10 (Nr ) = 3 dB. B. Polarization diversity versus antenna diversity However, antenna diversity generally calls for large antenna spacings to enable sufficient diversity gain : in some cases, depending on the angle spread, antenna spacings of 6λ c = 90 cm are required to attain sufficient decorrelation of the received signals. Unlike antenna diversity, polarization diversity relies on a dual-polarized antenna (consisting of two co-located antennas) that captures different orthogonal polarizations of the received signal. As indicated in Section II, its performance mainly depends on two parameters, namely the Cross Polarization Discrimination (XPD) and the Cross Correlation Coefficient (XCC). Figure 4 shows the performance of the ideal MMSE chip equalizer with one dual-polarized antenna (Nr = 1, Np = 2), and without temporal oversampling (Ns = 1), for four different values of the XPD, ranging from −3 to −12 dB, while keeping the XCC fixed to 0. Conversely, Figure 5 shows the performance for four different values of the XCC, ranging from 0.1 to 0.7, while keeping the XPD fixed to 0 dB. As a benchmark, we also plot the performance for ideal antenna diversity with uncorrelated received signals assuming sufficiently large antenna spacings (Nr = 2, Np = 1). For a dual-polarized antenna with a 45 degrees slant, the XPD is larger than −3 dB in 90 % of the test cases [33]. From Figure 4, this results in a marginal 0.4 dB loss at a BER of 10−3 relative to antenna diversity. For all dual-polarized antenna configurations, the XCC remains below 0.7 for over 95 % of the test cases [33]. From Figure 5, this results in a considerable 1.3 dB loss at a BER of 10−3 relative to antenna diversity. C. Comparison of the different adaptive chip equalizers In order to test the adaptive chip equalizers of Section IV in conjunction with polarization diversity, we deploy a dual-polarized antenna at the mobile (Nr = 1, Np = 2) with an XPD of −3 dB and an XCC of 0.7. Figures 6 and 7 show the performance of the different adaptive chip equalizers compared to the ideal MMSE chip equalizer (assuming perfect CSI) over a time-varying multipath channel, whose Rayleigh fading paths possess the classical Jakes spectrum. For the above values of XPD and XCC and fixing the BER at 10−3 , the ideal MMSE chip equalizer with polarization diversity (Nr = 1, Np = 2) incurs a 1.4 dB loss compared to the one with ideal antenna diversity (Nr = 2, Np = 1) but still realizes a 3.6 dB gain
January 16, 2003
DRAFT
21
relative to its single-antenna counterpart. On top of this diversity gain, there is also an additional array gain of 10 · log10 (1 + XP D) = 1.8 dB. For both the pilot-trained and the semi-blind method, a Recursive Least Squares (RLS) as well as a Least Mean Squares (LMS) adaptive algorithm is considered. As already indicated in Paragraph IV-A.2, the forget factor λ in the RLS adaptive algorithms should be optimized with respect to the normalized coherence time of the channel. Likewise, the normalized stepsize µ ¯ in the LMS adaptive algorithms strikes the optimal trade-off between convergence speed on the one hand and steady-state error on the other hand. Figure 6 shows the results for a mobile speed of v = 50 km/h corresponding to a Doppler frequency of FD =
v ·Fc c
= 93 Hz, a normalized Doppler frequency of
fD = FD /Rs = 3.9·10−4 cycles per symbol, or a normalized coherence time of τc = 1/fD = 2592 symbols. The optimal value of the RLS forget factor proved to be λ = 0.99, resulting in a data memory of 1 1−λ
= 104 symbols, being
1 25
of the normalized coherence time τc . The optimal value of the LMS stepsize
turned out to be µ ¯ = 0.7. The RLS semi-blind algorithm achieves the best performance and only incurs a 1 dB loss relative to the ideal MMSE chip equalizer. Fixing the BER at 2·10 −2 , the RLS semi-blind algorithm realizes a 2 dB gain compared to the RLS pilot-trained algorithm. Similarly, the LMS semiblind algorithm realizes a 2.5 dB gain compared to the LMS pilot-trained algorithm. The performance gain of the semi-blind algorithms with respect to the pilot-trained algorithms (for both the RLS and the LMS version) originates from the extra knowledge about the unused spreading codes. Likewise, the RLS algorithms have similar performance as their LMS counterparts in the low SNR region (SN R ≤ 4 dB) but outperform them in the medium to high SNR region. For instance, the RLS semi-blind algorithm realizes a 1 dB gain compared to the LMS semi-blind algorithm whereas the RLS pilot-trained algorithm realizes a 1.5 dB gain compared to the LMS pilot-trained algorithm. The performance gain of the RLS algorithms with respect to the LMS algorithms (for both the semi-blind and the pilot-trained method) stems from their faster convergence speed and smaller steady-state error. Figure 7 shows the same curves but now for a mobile speed of v = 100 km/h corresponding to a Doppler frequency of FD = 186 Hz, a normalized Doppler frequency of fd = 7.8·10−4 cycles per symbol, or a normalized coherence time of τc = 1296 symbols. In this case, the optimal value of the RLS forget factor proved to be λ = 0.98, resulting in a data memory of
1 1−λ
= 52 symbols, again being
1 25
of the normalized coherence time τc . The optimal value of the LMS stepsize turned out to be µ ¯ = 0.9. The RLS semi-blind algorithm still achieves the best performance but now incurs a 1.5 dB loss relative to the ideal MMSE chip equalizer. However, again fixing the BER at 2·10 −2 , the RLS semi-blind algorithm now realizes a 3.4 dB gain compared to the RLS pilot-trained algorithm whereas the LMS semi-blind January 16, 2003
DRAFT
22
algorithm realizes a 6 dB gain compared to the LMS pilot-trained algorithm. Likewise, the RLS semiblind algorithm realizes a 2.8 dB gain compared to the LMS semi-blind algorithm whereas the RLS pilot-trained algorithm realizes a 5.4 dB gain compared to the LMS pilot-trained algorithm. Table III compares the complexity (in terms of multiply/accumulates) of the different adaptive chip equalizers for the previous case study, with T := (Qc + 1)Nc = 48 the total number of multi-channel chip equalizer taps. The detection phase is common to all adaptive chip equalizers and requires 188 M ops/s and a data-transfer bandwidth of 752 M W ords/s. Performing semi-blind instead of pilot-trained updating, requires 9 times more operations for the LMS version but only 6.6 times more operations for the RLS version. Selecting RLS instead of LMS updating, involves 8.3 times more operations for the pilot-trained method but only 6 times more operations for the semi-blind method. In summary, the RLS semi-blind updating involves 12.5 Gops/s and a data-transfer bandwidth of 50 GW ords/s. D. Dual-cell context This subsection discusses performance results for a dual-cell context, in which the mobile station is close to the edge between two cells. In this case, both the intra-cell and the inter-cell MUI affect the performance of the different receivers. For the desired cell, we assume the same parameters as in the beginning of Section V. The interfering cell is fully loaded with M 2 = 15 active users and its pilot. The ratio of the desired BS’s total transmit power versus the interfering BS’s total transmit power is fixed to σx21 /σx22 = 10 dB. Figure 8 compares the performance of the different adaptive chip equalizers over a time-varying channel for a mobile speed of v = 50 km/h. Compared to the corresponding single-cell context in Figure 6, the performance of the different chip equalizers degrades due to the inter-cell MUI. Fixing the BER at 10−3 , the ideal MMSE chip equalizer with polarization diversity (Nr = 1, Np = 2) incurs a 2.5 dB loss compared to the one with ideal antenna diversity (N r = 2, Np = 1). Again, the RLS semi-blind algorithm achieves the best performance and only incurs a 1.7 dB loss relative to the ideal MMSE chip equalizer. Fixing the BER at 2·10−2 , the RLS semi-blind algorithm realizes a 4.8 dB gain compared to the RLS pilot-trained algorithm. On the other hand, the LMS semi-blind and pilot-trained algorithms exhibit an error floor at 3·10−2 and 4·10−2 , respectively.
January 16, 2003
DRAFT
23
VI. C ONCLUSION To boost the performance of downlink chip equalization in the context of currently standardized DSCDMA systems, we have developed a general framework for multi-channel reception (receive diversity) at the mobile through a combination of antenna diversity, polarization diversity and temporal oversampling. However, temporal oversampling generally creates highly correlated subchannels, resulting in a bad performance. Antenna diversity usually requires large antenna spacings to achieve maximum diversity gain, and is therefore seldom implemented at the mobile. We have shown that polarization diversity solves this spacing problem, while keeping the same benefits as antenna diversity. Specifically, for realistic values of the relevant channel parameters, the ideal MMSE multi-channel chip equalizer with polarization diversity realizes a 3.6 dB diversity gain and a 1.8 dB array gain relative to its single-channel counterpart. To also tackle the CSI acquisition/updating problem, we have devised a pilot-trained and a semi-blind method for direct chip equalizer design. For both methods, we have derived an RLS as well as an LMS adaptive algorithm. The pilot-trained algorithms exploit the presence of a code-multiplexed pilot and are applicable as such to the UTRA FDD mode of the 3GPP standard. The semi-blind algorithms additionally exploit knowledge of the unused spreading codes and require a slight modification of the control signalling in the standard. For a realistic simulation setup, the RLS pilot-trained algorithm realizes a gain of up to 5.4 dB compared to its LMS counterpart, but also requires an 8.3 times higher complexity. Likewise, the RLS semi-blind algorithm realizes a gain of up to 3.4 dB compared to its pilot-trained counterpart, but again requires a 6.6 times higher complexity. We can conclude that the RLS semi-blind multi-channel chip equalizer with polarization diversity is a promising technique for future wideband DS-CDMA mobiles, from a performance as well as a complexity point of view.
January 16, 2003
DRAFT
24
R EFERENCES [1]
L.B. Milstein, “Wideband code division multiple access,” IEEE Journal on Selected Areas in Communications, vol. 18, no. 8, pp. 1344–1354, August 2000.
[2]
R. Price and P.E. Green, “A communication technique for multipath channels,” IRE, vol. 46, pp. 555–570, March 1958.
[3]
S. Verd´u, Multiuser Detection, Cambridge University Press, 1998.
[4]
R. Wichman and A. Hottinen, “Multiuser detection for downlink CDMA communications in multipath fading channels,” in Proceedings of VTC. May 1997, vol. 2, pp. 572–576, IEEE.
[5]
M. Honig and M.K. Tsatsanis, “Adaptive techniques for multiuser CDMA receivers,” IEEE Signal Processing Magazine, vol. 17, no.
[6]
P.B. Rapajic and B.S. Vucetic, “Adaptive receiver structures for asynchronous CDMA systems,” IEEE Journal on Selected Areas in
3, pp. 49–61, May 2000. Communications, vol. 12, no. 4, pp. 685–697, April 1994. [7]
U. Madhow and M.L. Honig, “MMSE interference suppression for direct-sequence spread-spectrum CDMA,” IEEE Transactions on
[8]
J.E. Smee and S.C. Schwartz, “Adaptive feedforward/feedback architectures for multiuser detection in high data rate wireless CDMA
Communications, vol. 42, no. 12, pp. 3178–3188, December 1994. networks,” IEEE Transactions on Communications, vol. 48, no. 6, pp. 996–1011, June 2000. [9]
H. Holma and A. Toskala, WCDMA for UMTS, Wiley, 2001.
[10] A. Klein, “Data detection algorithms specially designed for the downlink of CDMA mobile radio systems,” in Proceedings of VTC. May 1997, vol. 1, pp. 203–207, IEEE. [11] I. Ghauri and D.T.M. Slock, “Linear receivers for the DS-CDMA downlink exploiting orthogonality of spreading sequences,” in Proceedings of Asilomar Conference on Signals, Systems and Computers. November 1998, vol. 1, pp. 650–654, IEEE. [12] C.D. Frank, E. Visotsky, and U. Madhow, “Adaptive interference suppression for the downlink of a Direct Sequence CDMA system with long spreading sequences,” Kluwer Journal of VLSI Signal Processing, vol. 30, no. 3, pp. 273–291, March 2002. [13] T.P. Krauss, W.J. Hillery, and M.D. Zoltowski, “Downlink specific linear equalization for frequency selective CDMA cellular systems,” Kluwer Journal of VLSI Signal Processing, vol. 30, no. 3, pp. 143–161, March 2002. [14] K. Hooli, M. Latva-aho, and M. Juntti, “Multiple access interference suppression with linear chip equalizers in WCDMA downlink receivers,” in Proceedings of GLOBECOM. December 1999, pp. 467–471, IEEE. [15] R.G. Vaughan, “Polarization diversity in mobile communications,” IEEE Transactions on Vehicular Technology, vol. 39, no. 3, pp. 177–186, August 1990. [16] L. Mailaender, “CDMA downlink equalization with imperfect channel estimation,” in Proceedings of VTC-Spring. May 2001, vol. 3, pp. 1593–1597, IEEE. [17] S. Buzzi and H.V. Poor, “Channel estimation and multiuser detection in long-code DS/CDMA systems,” IEEE Journal on Selected Areas in Communications, vol. 19, no. 8, pp. 1476–1487, August 2001. [18] S. Bhashyam and B. Aazhang, “Multiuser channel estimation and tracking for long-code CDMA systems,” IEEE Transactions on Communications, vol. 50, no. 7, pp. 1081–1090, July 2002. [19] P. Komulainen and V. Haikola, “Adaptive filtering for fading channel estimation in WCDMA downlink,” in Proceedings of PIMRC. September 2000, vol. 1, pp. 549–553, IEEE. [20] A.J. Weiss and B. Friedlander, “Channel estimation for DS-CDMA downlink with aperiodic spreading codes,” IEEE Transactions on Communications, vol. 47, no. 10, pp. 1561–1569, October 1999. [21] T.P. Krauss and M.D. Zoltowski, “Blind channel identification on CDMA forward link based on dual antenna receiver at handset and cross-relation,” in Proceedings of Asilomar Conference on Signals, Systems and Computers. November 1999, vol. 1, pp. 75–79, IEEE. [22] S. Werner and J. Lilleberg, “Downlink channel decorrelation in CDMA systems with long codes,” in Proceedings of VTC-Spring. May 1999, vol. 2, pp. 1614–1617, IEEE.
January 16, 2003
DRAFT
25
[23] K. Hooli, M. Latva-Aho, and M. Juntti, “Performance evaluation of adaptive chip-level channel equalizers in WCDMA downlink,” in Proceedings of ICC. June 2001, vol. 6, pp. 1974 –1979, IEEE. [24] P. Komulainen, M.J. Heikkil¨a, and J. Lilleberg, “Adaptive channel equalization and interference suppression for CDMA downlink,” in Proceedings of ISSSTA. September 2000, vol. 2, pp. 363–367, IEEE. [25] C.D. Frank and E. Visotsky, “Adaptive interference suppression for Direct-Sequence CDMA systems with long spreading codes,” in Proceedings of Allerton Conference on Communication, Control and Computing, September 1998, pp. 411–420. [26] F. Petr´e, M. Moonen, M. Engels, B. Gyselinckx, and H. De Man, “Pilot-aided adaptive chip equalizer receiver for interference suppression in DS-CDMA forward link,” in Proceedings of VTC-Fall. September 2000, vol. 1, pp. 303–308, IEEE. [27] F. Petr´e, G. Leus, M. Engels, M. Moonen, and H. De Man, “Semi-blind space-time chip equalizer receivers for WCDMA forward link with code-multiplexed pilot,” in Proceedings of ICASSP. May 2001, vol. 4, pp. 2245–2248, IEEE. [28] F. Petr´e, G. Leus, L. Deneire, M. Engels, and M. Moonen, “Space-time chip equalizer receivers for WCDMA downlink with codemultiplexed pilot and soft handover,” in Proceedings of GLOBECOM. November 2001, vol. 1, pp. 280–284, IEEE. [29] M. Lenardi, A. Medles, and D.T.M. Slock, “Comparison of downlink transmit diversity schemes for RAKE and SINR maximizing receivers,” in Proceedings of ICC. June 2001, vol. 6, pp. 1679–1683, IEEE. [30] G. Leus, F. Petr´e, and M. Moonen, “Space-time chip equalization for space-time coded downlink CDMA,” in Proceedings of ICC. April 2002, vol. 1, pp. 568 –572, IEEE. [31] F. Petr´e, D. Barbera, L. Deneire, and M. Moonen, “Combined space-time chip equalization and parallel interference cancellation for DS-CDMA downlink with spatial multiplexing,” in Proceedings of PIMRC. September 2002, vol. 3, pp. 1117–1121, IEEE. [32] R. Fantacci and A. Galligani, “An efficient RAKE receiver architecture with pilot signal cancellation for downlink communications in DS-CDMA indoor wireless networks,” IEEE Transactions on Communications, vol. 47, no. 6, pp. 823–827, June 1999. [33] L. Lukama, K. Konstantinou, and D.J. Edwards, “Polarization diversity performance for UMTS,” in Proceedings of International Conference on Antennas and Propagation. April 2001, vol. 1, pp. 193–197, IEE. [34] W. Hachem, F. Desbouvries, and P. Loubaton, “MIMO channel blind identification in the presence of spatially correlated noise,” IEEE Transactions on Signal Processing, vol. 50, no. 3, pp. 651–661, March 2002. [35] G. Leus, P. Vandaele, and M. Moonen, “Deterministic blind modulation-induced source separation for digital wireless communications,” IEEE Transactions on Signal Processing, vol. 49, no. 1, pp. 219–227, January 2001. [36] S. Haykin, Adaptive Filter Theory, Prentice Hall, third edition, 1996. [37] J.G. Proakis, C.M. Rader, F. Ling, C.L. Nikias, M. Moonen, and I. Proudler, Algorithms for Statistical Signal Processing, Prentice Hall, 2002. [38] P.M. Grant, S.M. Spangenberg, D.G.M. Cruickshank, S. McLaughlin, and B. Mulgrew, “New adaptive multiuser detection technique for CDMA mobile receivers,” in Proceedings of PIMRC. September 1999, vol. 1, pp. 52–54, IEEE. [39] K. Li and H. Liu, “A new blind receiver for downlink DS-CDMA communications,” IEEE Communications Letters, vol. 3, no. 7, pp. 193–195, July 1999. [40] S. Mudulodu and A. Paulraj, “A blind multiuser receiver for the CDMA downlink,” in Proceedings of ICASSP. June 2000, vol. 5, pp. 2933–2936, IEEE. [41] D.T.M. Slock and I. Ghauri, “Blind maximum SINR receiver for the DS-CDMA downlink,” in Proceedings of ICASSP. June 2000, vol. 5, pp. 2485 –2488, IEEE. [42] TSG RAN, “UE radio transmission and reception (FDD),” TS 25.101, 3GPP, December 2001. [43] J.G. Proakis, Digital Communications, McGraw-Hill, third edition, 1995.
January 16, 2003
DRAFT
26
spreading
Aj1 sj1 [i]
c´1 [n]
scrambling
...
TX
cjs [n] AjM j
sjM j [i]
xj [n]
+
c´M j [n]
Ajp sjp [i]
c´p [n]
Fig. 1. DS-CDMA downlink transmission scheme of the j-th base station.
RX 1 descrambling y1 [n]
...
RX Nc
cjs [n]
Chip EQ
xbj [n]
∗
1/Ajm
despreading c´m [−n]
∗
sbjm [i]
yNc [n] Fig. 2. Chip equalizer receiver structure of the mobile station.
January 16, 2003
DRAFT
27
updating
detection
LMS pilot-trained
(N + 4)T
N (T + 1)
RLS pilot-trained
3T 2 + (N + 5)T ¯ j + 1)(N + 4)T (M
N (T + 1)
LMS semi-blind
N (T + 1)
¯ j + 3)T 2 + [(N + 3)M ¯ j + (N + 5)]T RLS semi-blind (2M
N (T + 1)
TABLE I O PERATION
COUNT OF DIFFERENT ADAPTIVE CHIP EQUALIZERS
Excess delay [ns] Average power [dB]
0 260 521 781 0
-3
-6
-9
TABLE II P OWER / DELAY PROFILE
OF MULTI - PATH CHANNEL
0
10
−1
Average BER
10
−2
10
−3
10
−4
Nr = 1, Ns = 1 Nr = 1, Ns = 2 N = 2, N = 1 r s Nr = 2, Ns = 2 single−user bound
10
0
2
4
6 8 10 Average SNR [dB]
12
14
16
Fig. 3. Performance of oversampling diversity versus antenna diversity for the ideal RAKE receiver (dashed lines) and the ideal MMSE chip equalizer receiver (solid lines). Nr is the number of receive antennas, Ns is the temporal oversampling factor. The single-user bound is the optimal performance of an interference-free system. January 16, 2003
DRAFT
28
0
10
N =2 r
XPD = −3 dB XPD = −6 dB XPD = −9 dB XPD = −12 dB
−1
Average BER
10
−2
10
−3
10
−4
10
0
2
4
6 8 10 Average SNR [dB]
12
14
16
Fig. 4. Influence of the Cross Polarization Discrimination (XPD) on the performance of ideal MMSE multi-channel chip equalization with polarization diversity. The Cross Correlation Coefficient (XCC) is fixed to 0. The performance of ideal antenna diversity with perfect power balance and decorrelation is shown as a reference. 0
10
Nr = 2 XCC = 0.1 XCC = 0.3 XCC = 0.5 XCC = 0.7
−1
Average BER
10
−2
10
−3
10
−4
10
0
2
4
6 8 10 Average SNR [dB]
12
14
16
Fig. 5. Influence of the Cross Correlation Coefficient (XCC) on the performance of ideal MMSE multi-channel chip equalization with polarization diversity. The Cross Polarization Discrimination (XPD) is fixed to 0 dB. The performance of ideal antenna diversity with perfect decorrelation and power balance is shown as a reference.
January 16, 2003
DRAFT
29
v = 50 km/h
0
10
−1
Average BER
10
−2
10
−3
10
−4
10
0
LMS pilot−trained RLS pilot−trained LMS semi−blind RLS semi−blind Ideal MMSE N = 2 p Ideal MMSE Nr = 2 2
4
6 8 10 Average SNR [dB]
12
14
16
Fig. 6. Performance of the different adaptive multi-channel chip equalizers with polarization diversity over a time-varying channel for a medium mobile speed of v = 50 km/h. The XPD is −3 dB, the XCC is 0.7. The RLS forget factor is λ = 0.99, the LMS stepsize is µ ¯ = 0.7. The performance of the ideal MMSE chip equalizer with polarization diversity and ideal antenna diversity is shown as a reference.
January 16, 2003
DRAFT
30
v = 100 km/h
0
10
−1
Average BER
10
−2
10
−3
10
−4
10
0
LMS pilot−trained RLS pilot−trained LMS semi−blind RLS semi−blind Ideal MMSE N = 2 p Ideal MMSE Nr = 2 2
4
6 8 10 Average SNR [dB]
12
14
16
Fig. 7. Performance of the different adaptive multi-channel chip equalizers with polarization diversity over a time-varying channel for a high mobile speed of v = 100 km/h. The XPD is −3 dB, the XCC is 0.7. The RLS forget factor is λ = 0.98, the LMS stepsize is µ ¯ = 0.9. The performance of the ideal MMSE chip equalizer with polarization diversity and ideal antenna diversity is shown as a reference.
January 16, 2003
DRAFT
31
updating detection LMS pilot-trained
230 M
188 M
RLS pilot-trained
1.9 G
188 M
LMS semi-blind
2.1 G
188 M
RLS semi-blind
12.5 G
188 M
TABLE III C OMPLEXITY
COMPARISON OF DIFFERENT ADAPTIVE CHIP EQUALIZERS
v = 50 km/h
0
10
−1
Average BER
10
−2
10
−3
10
−4
10
0
LMS pilot−trained RLS pilot−trained LMS semi−blind RLS semi−blind Ideal MMSE N = 2 p Ideal MMSE Nr = 2 2
4
6 8 10 Average SNR [dB]
12
14
16
Fig. 8. Performance of the different adaptive multi-channel chip equalizers with polarization diversity in a dual-cell context. The ratio of the desired BS’s total transmit power versus the interfering BS’s total transmit power is 10 dB. The XPD is −3 dB, the XCC is 0.7. Time-varying channel with a medium mobile speed of v = 50 km/h. The performance of the ideal MMSE chip equalizer with polarization diversity and ideal antenna diversity is shown as a reference. January 16, 2003
DRAFT