IEEE 6th Int. Symp. on Spread-Spectrum Tech. & Appli., NJIT, New Jersey, USA, Sept. 6-8, 2000
Parallel Interference Cancellation with Reduced Complexity for Multi-Carrier Spread-Spectrum FCDMA Achim Nahler, Ralf Irmer, and Gerhard P. Fettweis1 Mannesmann Mobilfunk Chair for Mobile Communications Systems Dresden University of Technology D-01062 Dresden, Germany e-mail:
[email protected] Abstract — The performance of code division multiple access (CDMA) systems is limited by multiple access interference (MAI) which can be overcome by multi-user detection schemes. Interference cancellation seems to be a good compromise in terms of performance and computational complexity compared to optimum multi-user detection or linear transformations e. g. based on the MMSE criterion. In this paper, the performance and complexity of parallel interference cancellation (PIC) are investigated for a multi-carrier spread-spectrum frequency code division multiple access (MC-SS FCDMA) system. Evaluating the structure of MAI, the PIC is modified to reduce its complexity to the order of the single-user detector while keeping the bit error rate in the range of the conventional PIC value.
I. Motivation In the future, wireless solutions will be demanded also for many indoor applications, e. g. for ad-hoc networks or LANs. Especially for the license-free ISM bands, where spread-spectrum techniques are required through the national communications commissions, direct-sequence spreadspectrum (DS-SS) systems are candidates for such applications with low signal processing complexity. In this paper MC-SS, introduced in the early nineties [1], [2], is investigated. For MC-SS a rectangularshaped spectrum is achieved through the modulation principle. Hence additional pulse-shaping can be avoided. This is a significant implementation advantage compared to DS-SS based on binary pseudo-random short codes. The typical multi-carrier problem of high amplitude fluctuations can be overcome through a proper code design [3]. To lower the implementation costs, it is useful that the data of all users are spread by the same code signal and the user distinction is obtained by a slightly different main carrier for each user [4], [5]. Hence multiuser capability is achieved through FCDMA. It is well known that the CDMA system performance is limited through MAI arising from non-ideal cross-correlations between signature sequences of several users. To overcome this drawback, in [6] the Viterbi algorithm is applied to multi-user detection that is optimum in terms of the bit error rate (BER). The complexity of that algorithm increases exponentially with the number of users. Hence, research has concentrated on sub-optimum algorithms with lower complexity. Linear al1 This work was partly supported by Deutsche Forschungsgemeinschaft contract Fe 423/2
gorithms such as the zero-forcing detector or the minimummean-square-error detector have been adopted to multiuser detection. The complexity grows still polynomially with the number of users. For non-linear interference cancellation the complexity is only a linear function of the number of users. But even for such algorithms computational efforts are enormous in realistic scenarios. Hence, algorithms with less complexity are needed. To reduce the complexity of the PIC, a principle of diversity reception is adopted to IC. In a RAKE receiver, only few strong paths are combined to achieve a diversity gain near to the maximum gain and many weak paths are neglected to constrain RAKE complexity. This means for IC that only few strong interferers are cancelled to improve the performance to values close to the single-user matched filter bound. If only these strong interferers are cancelled, the complexity is reduced dramatically but performance decreases only slightly compared to complete interference cancellation. Hence, this modification of the algorithm leads to tractable complexity. The paper is organized as follows. In Section II, MC-SS and the FCDMA scheme are explained, basic system presuppositions are given and the transmission model is given in vectormatrix-notation. In Section III multistage parallel interference cancellation is explained and two different implementation versions are introduced. Section IV deals with detailed complexity analysis of the implemented PIC’s. In Section V a novel strategy is given to reduce the complexity dramatically. In Section VI the performance of this low-complexity interference canceller will be shown and compared to ”full” interference cancellation. Finally, Section VII concludes the work.
II. Transmission Model In MC-SS the data signal with symbol duration TS is multiplied by the complex-valued spreading signal
X
K−1
c(t) =
j2π Tk t
Ck e
S
,
(1)
k=0
where the complex-valued coefficients Ck have the same mag√ nitude |Ck |=1/ K and differ only in their phase. To overcome the high amplitude fluctuations of the transmit signal, the phases should be adjusted according to [3]. As mentioned above, the data of all users are spread by the same signal and user separation is achieved by assigning a slightly different main carrier to each user. To guarantee sub-carrier orthogonality [4], the uth user’s frequency offset should be (u−1)/TS . Hence, the transmit signal of the uth user is given by j2π u−1 t T
s(u) (t) = d(u) (t) · c(t) · e
S
= d(u) (t) · c(u) (t).
(2)
Furthermore, it is assumed that a) the transmission is burstwise so that the opportunity of a TDD mode or TDMA scheme
r
RAKE ˆ d 0
˜ d L
...
PIC 1 ˜ d 0
PIC L ˆ d L
ˆ d 1
Fig. 1: Receiver structure containing a RAKE followed by L PIC stages
is enabled to enhance the flexibility of supported data rates, b) burst synchronization is achieved to simplify the channel estimation procedure and c) the channel impulse response is constant for the duration of one burst because of the envisaged indoor applications and the resulting large coherence time compared to the burst length (see also section VI). In the sequel, the discrete time model is explained for that (u) MC-SS FCDMA system. The data dn ∈{−1,1} of one burst of N symbols of the uth user are arranged in the vector
(u)
(u)
(u)
d(u) = d1 , d2 , . . . , dN
T
(3)
where ”T” denotes matrix transpose; the data of all users are
d = d(1)T , d(2)T , . . . , d(U ) T
T
.
(4)
Because of the FCDMA scheme the total system bandwidth is twice the bandwidth of one user. Therefore the incoming signal has to be sampled with an oversampling factor equal to 2 to avoid aliasing. With the chip duration TC = TS /K the sampled spreading signal of the uth user is
X
K−1
c(u) (nTC /2)=
Ck e
j2π k+u−1 T S
nTC 2
, n = 0, 1, . . . , 2K−1 (5)
k=0
and in vector notation
c(u) = c(u) (0), c(u) (TC /2), . . . , c(u) ((2K − 1) TC /2)
T
. (6)
Forming with the zero vector 0M =(0m )=(0), m=1, . . . , M , the 2U KN × U N matrix 0 1 00 02K . . . 02K(UN−(N−1)) . . . 02K(UN−1) A , (7) c = @c(1) c(1) . . . c(1) . . . c(U ) 02K(UN−1) 02K(UN−2) . . . 02K((U−1)N . . . 00 the synchronous transmit signals of all users are than s = c · d.
(8)
The uth user’s channel impulse response, based on an ideal estimation at the chip rate, is given through
(u)
(u)
(u)
h(u) = h1 , h2 , . . . , hP
T
.
(9)
Considering the oversampling factor of two, the channel impulse response becomes (u)
(u)
(u)
(u)
T
hO = h1 , 0, h2 , 0, . . . , hP , 0
.
(10)
Hence, channel filtering of one burst is described by the 2(KN + P − 1) × 2KN matrix
0 00 (u) hB = @ h(u) O
02KN−2
01 (u) hO 02KN−1
02 (u) hO 02KN
... ... ...
1 A
(11)
(1)
(2)
(U )
(12)
(13)
where w is white Gaussian noise, hence E(w · wH ) = N0 · I with the 2(KN + P − 1) × 2(KN + P − 1) identity matrix I and ”H” denotes Hermitian transpose.
III. Parallel Interference Cancellation In this section the receiver architecture and two different implementations are explained. As mentioned in the introduction, IC is an interesting sub-optimum multiuser detection scheme because of its performance gains and its moderate complexity compared to other schemes. Successive IC, proposed in [7], can reach the single user bound, but all users have to transmit with powers according to a specific power distribution to guarantee the same bit error rate. And another disadvantage of this scheme is the user-specific detection delay occurring from the fact that the users are detected serially. These drawbacks are overcome by parallel interference cancellation (PIC) introduced in [8] and applied to MC-SS FCDMA in this paper. As shown in Fig. 1 the receiver consists of a RAKE for each user followed by L PIC stages. The basic idea is to take the estimated data obtained by the initial RAKE and regenerate the transmitted signals to calculate the occurring MAI of each user caused by all other users and to substract it from the received signals. After this cancellation step a new data estimation is based on the interference reduced signal leading to more reliable data estimations than the initial stage. This procedure is repeated until the estimated data does not change from the previous stage to the current stage. It is obvious that the initial estimated data based on the RAKE are not very reliable. Hence, the cancellation scheme is fed also with wrong data and does not deliver interference-free signals. To improve further the performance, it was proposed in [9] to make a partial cancellation by a tentative decision device that weights the cancellation terms according to their reliability. Because knowledge does not exist yet how to calculate the optimum tentative decision devices for multi-path scenarios, the paper concentrates on PIC using hard decision as tentative decision that leads to full cancellation. The RAKE outputs ˆ = cH · hH · r = cH · hH · h · c · d + cH · hH w d (14) (0) are the initial values for the IC algorithm. In Fig. 2 the `th PIC stage is shown. The tentative hypotheses ˜ ˆ d (15) (`−1) = sgn(d(`−1) ) are re-spread and are passed through the multi-path channel. Than, the received signals are cleared from the MAI estimation and processed by a RAKE. The `th stage outputs ˆ = cH · hH ˜ d (16) s Σ · r − FWB · d (`)
With the 2(KN + P − 1) × 2U KN matrix h = (hB , hB , . . . , hB )
the received signal vector becomes r= h·c·d+w
(`−1)
with the cancellation matrix FWB = Σ · h · c − hs · c = (Σ · h − hs ) · c,
(17)
r Σ tentative decision device ˆ d (`−1)
re-spread
c
˜ d (`−1)
`th PIC stage
channel emulation Σ
h hs
interference summation −
RAKE
hH s − interference cancellation
cH
ˆ d (`)
Fig. 2: Principle of the `th PIC stage
where
Σ = I(1) , I(2) , . . . , I(U )
with m = 1, . . . , P −1 and n = m+1, . . . , P , the P ×P matrix
T
(18)
contains the 2(KN +P −1)×2(KN +P −1) matrices I(u) ;
0 (1) hB 0 ··· (2) B 0 hB ··· B hs = B .. B .. . @ . ···
0
0 0 .. .
1 C C C C A
(19)
(U ) hB
0
(u)
with the cancellation matrix FNB = hH NB · R · hNB where are the P U N × U N matrix 0 1 00 0P . . . 0P(UN−(N−1)) . . . 0P(UN−1) (1) (1) (1) (U ) A hNB = @h h ...h ...h 0P(UN−1) 0P(UN−2) . . . 0P((U−1)N . . . 00 and the P U N × P U N correlation matrix 0 1 0 r1,2 ··· r1,U B ··· r2,U C B r2,1 0 C R=B . .. C . .. @ .. . A rU,1 · · · rU,U −1 0 with the P N × P N matrices
'(3) 0 `,u (2) '`,u '(3) `,u '(1) '(2) `,u `,u ···
0
0
The P × P upper-triangular matrix (1) '(1) `,u = (ϕm,n )`,u =
1 K
X
n−m−1 k=0
∗
··· ··· ··· .. .
'
(1) `,u
(27)
8m−n−1 > > P c(`)∗(kTC )c(u) ((K − (m − n) + k)TC ) n < m 1 < k=0 = K−1 P (`)∗ (u) K> > c (k)c ((k − (n − m))TC ) n≥m : k=n−m
(28) and the P × P lower-triangular matrix
contains the matrices hB defined in Eq. 11 and 0 = (0m,n ) = (0), m = 1, . . . , 2(KN +P −1), n = 1, . . . , 2KN . In what follows, this implementation is called wide-band-PIC (WB-PIC) because the data are re-spread to wide-band signals and the wide-band channels are emulated, explicitly. In contrast to this, another implementation exists with performance equivalent to the WB-PIC. In that implementation the estimated MAI, which is nothing else than the path-weighted datadependent cross-correlations, is subtracted from the RAKE outputs. The outputs of the `th stage are ˆ =d ˆ − FNB · d ˜ d (20) (`) (0) (`−1)
0 (2) '`,u B '(1) B `,u B 0 r`,u = B B B .. @ .
(2) '(2) `,u = (ϕm,n )`,u
0 0 0 .. .
'
(3) '(3) `,u = (ϕm,n )`,u
=
1 K
X
(29)
K−1
∗
c(`) (kTC )c(u) ((k − (m − n))TC )
(30)
k=m−n
with n = 1, . . . , P −1 and m = n+1, . . . , P describe the partial cross-correlations with the previous, the current and the succeeding symbol, respectively. This implementation works on the symbol rate and is called narrow-band-PIC (NB-PIC).
IV. Complexity Analysis
(21)
The two implementations of the same algorithm, WB-PIC and NB-PIC, are investigated concerning their complexity in terms of real-valued operations. First, the complexity of a RAKE is calculated, because the RAKE is part of the receiver architecture and serves as single-user detector for comparison purposes. A bank of RAKEs, each having F fingers, needs
(22)
OR,CM (U ) = F (2K + 1)U
(31)
OR,CA (U ) = F (2K + 1)U
(32)
(23)
complex-valued multiplications and additions, respectively, to detect for U users one symbol per user. One complex-valued multiplication needs four real-valued multiplications plus two real-valued additions. One complex-valued addition needs two real-valued additions. Then OR (U ) = 8F (2K + 1)U
1 C C C C . C C A
(33)
real-valued operations are needed. For one WB-PIC stage a)
(24)
}|
{
z}|{ z }| { z }| { z
}|
e)
{
OW1,CM (U ) = 2KU+2KP U b)
OW1,CA (U ) =
(2) `,u
(25)
c(`) (kTC )c(u) ((K − (n − m) + k) TC ) (26)
e)
z
b)
z }| { z }| {
+F (2K+1)U c)
d)
P U +2KU+2KU+F (2K+1)U
(34) (35)
complex-valued multiplications and additions, respectively, are required for a) re-spreading, b) channel emulation, c) interference summation, d) interference cancellation and e) the RAKE, finally, and hence OW1 (U ) = (8F (2K+1) + 12K(P+1) + 2(P+4K)) U
(36)
real-valued operations are needed. In the same manner the complexity of the NB-PIC is calculated. For subtracting the interfering cross-correlations of each path of all other users from each path (of each user), ON1,CM (U ) = F (P (U − 1) + 1)U
(37)
ON1,CA (U ) = F (P (U − 1) + 1)U
(38)
complex-valued multiplications and additions and thus
I(F, P, U ) = F · P · (U − 1) (39)
real-valued operations are needed. Finally, the total complexity of the receiver is derived. For a WB-PIC and NB-PIC with L stages OWB-PIC (U, L) = OR (U ) + L · OW1 (U )
(40)
ONB-PIC (U, L) = OR (U ) + L · ON1 (U )
(41)
number of real-valued ops./104
real-valued operations are required. From Eqs. 36 and 39, it is clear that the complexity of one WB-PIC stage increases linearly in terms of the number of users, whereas the complexity of one NB-PIC stage increases quadratically with the number of users. Figure 3 shows the number of operations as a function of the number of users U with U = 1, . . . , 16 for a typical scenario, that is also used for performance measurements in section VI. The (indoor) channel impulse response is resolved in four taps, so that F = P = 4, the spreading factor is K = 16. For comparison purposes, the single-user RAKE complexity
8 6 4 2
0
4
8 12 number of users U
16
Fig. 3: Number of operations vs. the number of users: RAKE
(—◦—), WB-PIC (∗), NB-PIC ( ); 1 stage (—), 2 stages (– –)
as lower bound is depicted (—◦—). Furthermore it is interesting that in this specific indoor scenario the NB-PIC (1 stage: — —, 2 stages: – — – –) is less complex than the WB-PIC (1 stage: —∗—, 2 stages: – –∗– –). But, it has to be noted that the total complexity is enormous, even though it increases only linearly with the number of users. Therefore, it is necessary to modify the algorithms in such a way that the complexity is decreased significantly.
ORC1 (U, is ) = 14U · is ,
(43)
because two complex multiplications and one complex addition are needed to cancel one interference term. The complexity of an RC-PIC with L stages is then ORC-PIC (U, L, is ) = OR (U ) + L · ORC1 (U, is )
(44)
The cost for the interference scenario evaluation (calculation of all possible interference constellations and selection of the is strongest interferer) can be neglected because it is done only once per burst. In Fig. 4 the computational complexity of the algorithm described above is shown for the scenario used in section IV. It can be seen that the complexity reduction is dramatical. If only 6.25% (solid line) and 12.5% (dashed line) of the occurring interference terms are considered for interference cancellation, the additional PIC complexity can be reduced by 90% and 80%, respectively, and the overall complexity is still in the order of the RAKE complexity.
V. Complexity Reduction In this section a novel approach is given to reduce enormously the complexity of the PIC compared to the conventional PIC described in the previous section and in what follows called full complexity PIC (FC-PIC). It is known from diversity reception that only few strong paths have to be combined with a RAKE receiver to achieve a diversity gain near to the optimum gain. Many weak paths are neglected to limit
(42)
interference terms. If only is I(F, P, U ) interference terms are considered for the interference cancellation the complexity can be reduced as follows. In a first step all possible interference power constellations based on the cross-correlations and the channel coefficients have to be calculated. Then this interference scenario has to be evaluated with the goal to select the is strongest interference terms. Finally, the estimated MAI occurring from the is selected interference constellations is subtracted from the RAKE output. The complexity of one stage of the reduced complexity NB-PIC (RC-PIC) is given by
number of real-valued ops./104
ON1 (U ) = 8F (P (U − 1) + 1)U
0
the RAKE complexity. This idea is adopted to IC. MAI can still be reduced significantly if only a few strong interferers are cancelled meaning that performance can be strongly improved at low additional complexity. The NB-PIC is better suited for the realization of that idea than the WB-PIC. In the case that U users are active, each user’s channel has P paths and the RAKE has F fingers, then the RAKE output is influenced by
Fig. 4:
8 6 4 2 0
0
4
8 12 number of users U
Number of operations vs.
16
the number of users:
RAKE (—◦—), NB-PIC (——) and RC-PIC (is /I(4, 4, U ) = 6.25%: —, is /I(4, 4, U ) = 12.5%: — —)
4
For future applications as mentioned in Section I the 2.4 GHz ISM band is of interest because it is license-free. To simulate the behavior of the MC-SS FCDMA system in such an environment, channel models are used which are based on measurements for frequencies around 2.2 GHz [10]. The nonline-of-sight indoor model Pic2 is used for simulations in what follows. The model Pic2 is characterized by a maximum delay of 500 ns, and a coherence bandwidth of 2.5 MHz. The reader may refer to [10] for a detailed description of the channel model. The investigated system has the following technical parameters. The signal bandwidth is B=8 MHz, the spreading gain is K=16 and the symbol duration is TS =2µs. Assuming the carrier frequency fc =2.4 GHz and a maximum velocity of v=3m/s for indoor applications, the coherence time becomes [11] (∆t)c =9c/(16πvfc )=7.5ms, so that for a burst length of N =100 symbols (burst duration TB =200µs) the channel can be seen as time-invariant. 100
SNR degradation [dB]
VI. Numerical Evaluation
3
BER = 10-3
2
1 0 6.25
BER = 10-2
12.5 25 is /I(F = 4, P = 4, U ) [%]
37.5
Fig. 6: Channel Pic2: SNR degradation vs. the number of considered interference terms: 8 users (—), 16 users (— —)
proposed PIC scheme is an attractive approach for powerful interference cancellation at single-user detection complexity. Therefore, it will be of practical implementation interest.
10−1
References BER
[1] N. Yee, J.-P. Linnartz, and G. P. Fettweis, “Multi-Carrier CDMA in Indoor Wireless Radio Networks,” in Proc. 4th IEEE Int. Symp. Personal, Indoor and Mobile Radio Commun., pp. 109–113, 1993.
10−2 10−3
10−4 6.25
[2] K. Fazel, “Performance of CDMA/OFDM for Mobile Communication Systems,” in Proc. 2nd IEEE Int. Conf. Universal Personal Commun., pp. 975–979, 1993.
12.5 25.0 is /I(F = 4, P = 4, U = 8) [%]
37.5
Fig. 5: Channel Pic2: BER vs. the number of considered interference terms: RAKE (— —), MFB (—), RC-PIC (1 stage: —?—, 2 stages: — —)
Figure 5 shows the bit error rate vs. the number of considered interference terms is for an SNR=16.5 dB and 8 active users. It can be seen that only 25% of the interference terms are necessary to reach the performance of the FC-PIC. Figure 6 depicts the SNR degradation of the RC-PIC compared to the FC-PIC achieving uncoded BER of 10−2 and 10−3 , respectively. It can be seen that for a BER=10−2 , the SNR degradation is only about 1 dB and