Document not found! Please try again

Frequency-Domain Interference Cancellation and ... - Semantic Scholar

2 downloads 14098 Views 308KB Size Report
and low-cost implementation. However, multipath propagation of the channel of each user significantly affects the performance of CDMA transmissions. In fact ...
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 4, NO. 5, SEPTEMBER 2005

2329

Frequency-Domain Interference Cancellation and Nonlinear Equalization for CDMA Systems Stefano Tomasin, Member, IEEE, and Nevio Benvenuto, Senior Member, IEEE

Abstract—In wireless broadband communications using code division multiple access (CDMA), interference cancellation (IC) techniques have been proposed to reduce multiuser access interference (MAI); however, their performance is limited by complexity constraints that still favor the use of rake receivers. In this paper, we propose an IC architecture that makes use of a block decision feedback equalizer (DFE) to remove intersymbol interference (ISI) and whose overall complexity is much lower than equivalent (in terms of performance) existing structures. In fact, by making use of a particular block data transmission format, all filters are implemented in the frequency domain (FD), and in particular, the DFE is iteratively designed according to the reliability of the detected data in the previous iteration. Due to the strong interaction of the FD–IC and the block DFE, error propagation in the receiver is reduced by simply interleaving (INT) chips before transmission. Simulations performed for an uplink communication on a wireless multipath channel show that the combination of FD–IC, block DFE, and chip INT provides an efficient solution with good performance for CDMA systems in dispersive channels. Index Terms—CDMA, frequency-domain equalization, interference cancellation, iterative equalization.

I. I NTRODUCTION

T

HE latest generations of mobile networks require an increased capacity to support high-rate multimedia applications on a common communication channel [1]. Since the spectrum is dynamically assigned to each user, an access scheme is needed to share the radio resource. Among the various alternatives proposed in the literature, many are based on code division multiple access (CDMA) due to its flexibility and low-cost implementation. However, multipath propagation of the channel of each user significantly affects the performance of CDMA transmissions. In fact, the dispersion of the channel disrupts the orthogonality among codes of different users, thus yielding multiuser access interference (MAI). Moreover, interference among chips [i.e., interchip interference (ICI)] of the same user arises, and despreading (DS) may not properly restore the data signal with the effect of having intersymbol interference (ISI). In order to combat MAI, ICI, and ISI, various receiver architectures have been proposed. One of the most common Manuscript received July 21, 2003; revised January 5, 2004 and January 23, 2004; accepted July 31, 2004. The editor coordinating the review of this paper and approving it for publication is J. Evans. This research has been developed in the frame of the Italian National Project on Fundamental Research (FIRB) “Reconfigurable platforms for wideband wireless communications.” The authors are with the Department of Information Engineering, University of Padova, 35131 Padova, Italy (e-mail: [email protected]; nb@dei. unipd.it). Digital Object Identifier 10.1109/TWC.2005.853823

solutions is the rake receiver [2], which for moderately dispersive channels provides a good tradeoff between complexity and performance. On the other hand, as the channel dispersion increases, the performance of this receiver significantly degrades. Indeed, optimum receivers should include multiuser detection (MUD), where more users are simultaneously detected [3]. A suboptimal but still effective MUD technique is interference cancellation (IC), where, after a bank of rake receivers, the data of all users are detected to obtain tentative decisions that are used to regenerate MAI, which in turn is canceled from the received signal. Two IC schemes have been proposed: serial IC (SIC) and parallel IC (PIC). In SIC, at each stage, only one user is detected and its interference contribution is canceled before detection of the next user. In PIC, simultaneous detection of all users and IC is performed iteratively. However, the performance of traditional SIC and PIC schemes with rake receivers is limited by both ICI and intersymbol interference (ISI). Additionally, because the rake receiver operates at the chip rate, the computational complexity of these schemes may be significant for channels with more than just a few taps. For the reduction of ICI, linear equalizers operating at the chip rate have been studied in [4] and [5] while a joint equalization and MUD scheme has been considered in [6]. With regard to nonlinear solutions, a common structure comprises a feed-forward (FF) filter, operating at the chip rate, and a feedback (FB) filter that is fed with past detected symbols [7]–[11]. Moreover, a scheme that iterates IC and linear equalization has been proposed in [11], where coding was also included in the iterative process, thus increasing significantly the overall complexity. All of these structures operate in the time domain at the chip rate; hence, their computational complexity is quite high. To reduce the complexity, two efficient implementations of PIC have been investigated in [12]. The first concentrates the IC only on the most significant terms while the second exploits the multistage structure of PIC to avoid double calculations. In this paper, we propose a new IC architecture that allows the use of a more elaborate equalizer at an affordable complexity. Indeed, a structure that is very similar to ours was proposed in [13] and [14], where both design and performance were derived analytically only for channels with particular statistical properties and in the asymptotic case of infinite length filters. The efficiency of the receiver is obtained by implementing all filters in the frequency domain (FD) through elementwise products of discrete Fourier transforms (DFTs), and the resulting structure is denoted FD–IC. The equivalence between the time-domain filtering and the FD products is ensured by a particular block data transmission format, where the stream

1536-1276/$20.00 © 2005 IEEE

2330

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 4, NO. 5, SEPTEMBER 2005

of chips is divided into blocks separated either by a cyclic extension or by a fixed PN sequence [15], [16]. As in similar data block processing structures (e.g., orthogonal frequency division multiplexing [16]), FD–IC has two main drawbacks: 1) a slight bandwidth inefficiency, due to the data block extension, and 2) an increased latency, since detection is performed on blocks of CDMA symbols. At the same time, as single carrier transmission, FD–IC has various advantages over multicarrier CDMA, including a reduced peak to average power ratio and a reduced sensitivity to carrier frequency offset and phase noise. This leads to a smaller size and lower cost transmitter (mobile station) [15], [17], [18]. At the receiver, for each stage of the IC process, a block decision feedback equalizer (DFE) is iterated with detection. In particular, for each user, at the first block iteration, only the FF filter is active and a tentative decision on data is performed. At the next iterations, decisions of previous iterations are used to generate the FB filter input signal, whose output is added to the FF filter output before the new data detection. In our scheme, at each block iteration of the equalization–detection process, coefficients of both FF and FB filters are varied depending on the reliability of the detected data at the previous iteration. However, we have verified by simulations that the selected Walsh–Hadamard CDMA spreading (SP) codes together with the FD implementation of the block DFE yield error propagation. In fact, detection errors translate into spikes in the FD. To reduce this phenomenon, the chips of each block are interleaved before transmission so that, after de-interleaving, errors are spread over all frequencies of the FD FB filter. Simulations have been performed for the FD–IC scheme using both PIC and SIC on an uplink wireless broadband channel. Results show that FD–IC significantly outperforms existing IC schemes based on the rake receiver due to its ability to combat ISI efficiently. This paper is organized as follows. In Section II, we describe the system model and the data transmission format that will be used in the FD implementation of the receiver, including the chip-level INT. Section III is devoted to the description of the FD implementation of both SIC and PIC and of the iterative block DFE detector. The design of the filter coefficients of the DFE is considered in Section IV. An evaluation of the complexity of the FD–IC scheme is given in Section V and is compared with that of the efficient structures [12]. The performance of the FD–IC is derived in Section VI with simulations on an uplink broadband communication. Lastly, in Section VII, we derive the conclusions of this work.

II. S YSTEM M ODEL We consider a wideband CDMA communication system, where K users are transmitting simultaneously, using codes with an SP factor NS . For the user k, the data signal1 d(k) (n),

1 Notation: a signal is denoted with a lowercase letter while its DFT is denoted with the corresponding uppercase letter. Vectors and matrices are denoted with bold italics letters. * denotes complex conjugate.

at rate 1/T , is spread with the code c(k) (m), m = 0, 1, . . . , NS − 1, to obtain the data sequence (chips) s(k) (m + nNS ) = c(k) (m)d(k) (n),

k = 0, 1, . . . , K − 1 (1)

having rate 1/(T NS ). For transmission, s(k) (n) is modulated with the pulse-shaping transmit filter. In a wideband uplink transmission, for each user, the channel can be modeled as the sum of delayed paths with different attenuations and phases. At the base station, the received signal is filtered by a filter matched to the transmit pulse shape and then sampled at the rate 1/(T NS ), [2]. By denoting with h(k) (q), q = 0, 1, . . . , Nq − 1, the discrete-time base band equivalent impulse response of the channel for the user k, including pulse shaping and matched filter, the received discretetime signal comprises the sum of the signals of all users, i.e., 

r () =

K−1 

r(k) () + w()

k=0

=

q −1 K−1  N

h(k) (q)s(k) ( − q) + w()

(2)

k=0 q=0

where w() is the complex additive white Gaussian noise (AWGN) term having zero mean and power spectral density N0 /2 per dimension. If DS with code of the user k is directly applied to r (), the obtained signal is affected by MAI and ISI. Thus, more elaborate receivers must be used. In this paper, we propose a receiver with joint IC and equalization where filtering is performed in the FD. A. Data Transmission Format In order to implement filter operations as the product of DFTs on sample blocks of size P , the convolution between the channel impulse response and the transmitted data signal must be circular on blocks of size P . We summarize here two methods for forcing the circularity. In [15] and [17], it has been proposed to append to each block of M information data  s(k) (m) = s(k) (mM ), s(k) (mM + 1), . . . ,  s(k) (mM + M − 1) (3) a known PN sequence [ p(0), p(1), . . . , p(L − 1)]. The transmitted data block of length P = M + L is given by s(k) (m)   = s(k) (mP ), s(k) (mP + 1), . . . , s(k) (mP + P − 1)  = s(k) (mM ), s(k) (mM + 1), . . . ,  s(k) (mM + M − 1), p(0), p(1), . . . , p(L − 1) . (4)

TOMASIN AND BENVENUTO: FREQUENCY-DOMAIN INTERFERENCE CANCELLATION AND NONLINEAR EQUALIZATION

This extension is denoted zero padding in the case p(n) = 0, n = 0, 1, . . . , L − 1 [19]. Note that an additional PN extension is required before the first data block. Another technique that forces the convolution to be circular is the cyclic-prefix (CP) extension, where the first L symbols coincide with the last symbols of the transmitted blocks, i.e.,  s(k) (m) = s(k) (mM ), s(k) (mM + 1), . . . , s(k) (mM + M − 1), s(k) (mM ), . . . ,  s(k) (mM + L − 1) . (5) Here, the convolution is circular on blocks of size P = M . CP transmission is used in multicarrier communications [16] and has also been proposed for single-carrier transmissions with linear FD equalization [20], [21]. We note that, while in the DFE proposed in [15] and [17], where the FB operates in the time domain, only the PN extension method could be used, here there is no such limitation. However, it is seen that the PN extension method yields a better performance than the CP method because it implies a reduced error propagation phenomenon in the block DFE. On the other hand, for the same bandwidth efficiency, the CP method has a reduced complexity since DFTs are performed on blocks of size M instead of P . We indicate with r(n) the received signal (2) corresponding to the data signal (4). Provided that the length L of the PN sequence or CP is not lower than the channel length Nq , the DFT of r(n) on the mth block of size P will be Rp (m) =

P −1 

e−

j2πp P

r(mP + )

(6)

=0

where p = 0, 1, . . . P − 1. Analogously, let us denote with S (k) (m), W (m), and H (k) the DFTs of s(k) (m), w(m), and h(k) , respectively. Assuming that all users are quasisynchronous, i.e., the impulse of each channel falls within the extension of length L, the received signal can be written in the FD as Rp (m) =

K−1 

Hp(k) Sp(k) (m) + Wp (m).

(7)

k=0

In the following, we will use the PN extension method. However, variations for a CP method are straightforward. Moreover, to simplify the notation, we omit the block index m since all operations are performed on blocks of P samples of the received signal. B. Chip INT It is well known that in a DFE, the use of past detected data yields error propagation phenomena. Here, the impact of errors is particularly significant because of CDMA SP and the FD implementation of the DFE. In fact, the error on one data symbol is propagated by the SP on an entire CDMA

2331

symbol (i.e., NS chips), thus generating peaks of errors in the FD according to the spectral characteristics of the SP code. In order to alleviate the phenomenon, chips are interleaved before transmission. In particular, the chip interleaver operates within each block and interleaves chips of different symbols. Due to its simplicity, we consider a row–column block interleaver where the M information chips are written in a matrix column-wise and read row-wise [2]. In the Appendix, by a simple example, we can see that INT indeed spreads the burst of chip errors among the different frequencies of the equalizer. III. IC AND B LOCK DFE The most common structure for wideband CDMA transmissions comprises a rake, i.e., a filter matched to the channel impulse response, followed by a despreader and a detector. In our FD implementation, the matched filter operation yields the FD signal Yp(k) = Hp(k)∗ Rp ,

p = 0, 1, . . . , P − 1.

(8)

Note that in an FD implementation, no reduction of complexity is achieved by considering only a subset of the coefficients (fingers) of the channel impulse response. In a traditional (k) basic receiver, after the matched filter, the vector Y (k) p = [Y0 , (k) (k) Y1 , . . . , YP −1 ] would be transformed into the time domain by an inverse DFT (IDFT). The signal is then despread. Lastly, detection follows. In a more elaborate receiver, the impact of MAI can be reduced by applying an IC scheme. Indeed, the cancellation and detection procedures may be iterated a few times in order to increase the reliability of data. Here, we propose to integrate IC with a block DFE that mitigates the effects of ISI. To improve efficiency, all filtering operations are performed in the FD. To this end, we first describe an FD implementation of SIC and PIC schemes. Then, we design the block DFE. A. FD Implementation of SIC In SIC, IC is performed serially for each user and its FD implementation (FD–SIC) is shown in Fig. 1. The received signal is first divided into blocks of size P by a serial to parallel converter (S/P) and each block is transformed into the FD to obtain R, whose components are given by (7). The following operations perform a multistage detection and IC of each user signal. At the first stage, R is used as input of the block interference0 , which performs data detection of the user k = 0 and the subsequent interference generation that is canceled from the input of the next stage. In general, in the basic SIC receiver, the interferencek block includes the matched filter H (k)∗ (or rake), the despreader, and the detector that generates ˆ (k) . To obtain the interference contribution in the the signal d ˆ (k) is first spread to form s ˆ(k) , which in turn FD V (k) , d (k) ˆ by appending the PN sequence yields the augmented block s ˆ(k) is then transformed into FD to obtain the [see (4)]. s

2332

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 4, NO. 5, SEPTEMBER 2005

other users, as shown in the efficient implementation of Fig. 2. In this case, the signal at the input of the interferencek block is X (k) = R −

K−1 

V (u) ,

k = 0, 1, . . . , K − 1.

(11)

u=0,u=k

As an alternative, cancellation could include also ISI [24] and in this case   j2πp∆k Xp(k) = Rp − Hp(k) − e− P h(k) (∆k ) × Sˆp(k) −

K−1 

Vp(u)

(12)

u=0,u=k

where k = 1, 2, . . . , K − 1. The order of cancellation of the various users is varied according to the power of the received user signals and their interference contribution to other users [22]. Note in fact that the effect of MAI on the X (k) signal diminishes from the first to the last stage, and for equal user bit-error rate (BER) performance, a proper power control must be implemented [23].

k = 0, 1, . . . , K − 1, p = 0, 1, . . . , P − 1, where ∆k is the delay corresponding to the coefficient with the largest amplitude of the channel impulse response of user k. Note that in the scheme of Fig. 2 the IC removes the contribution of other users. In order to take into account the reliability of the tentative decision, in [25] it is proposed to weight the signals before cancellation and obtain a tradeoff between cancellation of interference and insertion of a disturbance due to decision errors. This weighting can be easily accomplished in the FD–PIC scheme by multiplying V (k) with an appropriate gain before cancellation. Similarly, in other IC schemes soft detected data are used to remove MAI in each IC iteration [6], with much improved performance. For a coded transmission, since all operations are performed on blocks of CDMA symbols, the detected signal can be also decoded before FD–PIC is performed, [11]. Here, we focus on the basic FD–IC structure, and we do not consider weighting and coding. Overall, the main differences between FD–PIC and FD–SIC are similar to their time-domain implementation counterparts [26]. In particular, since users are detected in cascade, FD–SIC has a longer processing delay than FD–PIC, which performs detection in parallel. On the other hand, as confirmed by the results reported in Section V, FD–SIC has a reduced complexity, since MAI is canceled only once from each user. In Section VI the performance of the two schemes are compared in terms of BER, and we show that FD–PIC outperforms FD–SIC in a broadband uplink wireless transmission.

B. FD Implementation of PIC

C. FD Equalization and Interference Generation

In PIC, IC is performed in parallel for all users and its FD implementation (FD–PIC) is shown in Fig. 2. As in FD–SIC, also here, the interferencek block performs detection and generation of the interference contribution of user k. The following operations perform iteratively detection and IC of user signals for I times. We refer to PIC iterations in the PIC process to avoid confusion with term iteration which is reserved for the equalization-detection process. At the first PIC iteration (i = 1), all FB signals of Fig. 2 are not active and the received signal is applied to a bank of interferencek blocks, performing parallel detection of all user data signals and generation of MAI. At PIC iterations i = 2, 3, . . . , I, the input to each interferencek block is obtained by canceling from the received signal the MAI due to all

Both FD–PIC and FD–SIC include the interferencek blocks that will be described here in more detail. While the outer IC is used to reduce MAI, the purpose of the interferencek block is to regenerate the interference due to user k in the received signal. Indeed, this signal is affected by ISI, and, for better detection (and, hence, regeneration), a DFE is used. In other words, we replace the traditional matched filter (or rake) with a block iterative DFE to combat ISI. Contrary to the DFE proposed in [9], where the FB signal was designed to cancel only the interference among contiguous CDMA symbols, here the DFE operates at the chip rate, hence it is more effective in equalizing. The general scheme of the interferencek block is shown in Fig. 3. Note that, due to the CDMA transmission format,

Fig. 1. General architecture of SIC in the FD. (k)

ˆ , which is multiplied by the frequency response of vector S the corresponding channel, i.e., Vp(k) = Hp(k) Sˆp(k) ,

p = 0, 1, . . . , P − 1.

(9)

A more detailed description of the interferencek block is presented in Section III-C, where a DFE before data detection is used. With regard to Fig. 1, each stage of SIC includes cancellation of the interference contribution of previously detected users. Hence, the input of the interferencek block is given by X (k) = R −

k−1 

V (u) = X (k−1) − V (k−1)

(10)

u=0

TOMASIN AND BENVENUTO: FREQUENCY-DOMAIN INTERFERENCE CANCELLATION AND NONLINEAR EQUALIZATION

Fig. 2.

2333

General architecture of PIC in the FD.

Fig. 3. Description of the interferencek block in the FD, which comprises a DFE and an interference generator, formed of a despreader followed by a detector and a spreader whose output feeds both the FB filter of DFE and the interference pulse shaping filter.

Fig. 4.

FD implementation of DFEk .

a time-domain DFE cannot be applied directly on the received signal since DS must be performed on blocks of chips before the FB signal is available. Hence, we use a block DFE, where the FB data signal is generated in blocks. Here, we propose an implementation of a block DFE operating in the FD, where equalization and detection are iterated ν times on the input signal X (k) . Since all FD signals are already available, the DFE scheme is reduced to the simple operations shown in Fig. 4. At the iteration l, the FF filter is implemented in the FD by the element-wise complex multiplication of X (k) (k,l) with the vector of the FF filter coefficients C (k,l) = [C0 , (k,l) (k,l) C1 , . . . , CP −1 ], l = 1, 2, . . . , ν. The FB filter coefficients in (k,l) (k,l) (k,l) the FD are denoted as B (k,l) = [B0 , B1 , . . . , BP −1 ]. When the DFE is first applied (l = 1) to the received vector signal, no previous decision is available and the FB signal is (k,1) = 0, p = 0, 1, . . . , P − 1. In this case, not active, i.e., Bp DFEk becomes a linear equalizer and the resulting vector signal has the elements Yp(k,1)

=

Xp(k) Cp(k,1) ,

p = 0, 1, . . . , P − 1.

(13)

Y (k,1) is then transformed into the time domain by IDFT. After ˆ (k,1) . DS, detection (DET) yields the tentative decision vector d

The detailed description of the various blocks is shown in Fig. 5, where parallel to serial conversion is denoted with P/S. Note that after the IDFT of the vector signal Y (k,1) , the last L symbols are discarded before de-interleaving, and DS is performed. Detection follows. Indeed, there are linear IC schemes that do not include detection [27]. A new equalization iteration can now be applied. First, the ˆ (k,1) is regenerated by SP, INT, and PN detected data block d insertion. The whole block is transformed in the FD by DFT ˆ (k,1) . Then, equalization of the input to obtain the data vector S signal is performed, which now includes the FB coefficients (k,2) Bp , p = 0, 1, . . . , P − 1, in the DFE, as shown in Fig. 4. In general, from the second iteration (i.e., l > 1), the DFEk generates the vector signal Y (k,l) with elements Yp(k,l) = Xp(k) Cp(k,l) + Bp(k,l) Sˆp(k,l−1)

(14)

where p = 0, 1, . . . , P − 1. In Section IV, we provide an algorithm for the iterative design of both FF and FB filters according to the minimum mean square error (MSE) criterion. The equalization and detection processes are iterated ν times ˆ (k,ν) is available that is muluntil a reliable decision vector S tiplied by the corresponding channel frequency response to obtain the interference contribution in the FD that must be canceled from the received signal Vp(k) = Hp(k) Sˆp(k,ν) ,

p = 0, 1, . . . , P − 1.

(15)

The interaction between equalization and cancellation is different for SIC and PIC. Here, in the FD–SIC scheme, the interferencek block is activated only once, when detecting user k. In the FD–PIC scheme, instead, the cancellation and equalization processes are integrated. In particular, the detected data at the previous PIC iteration are used as input of the FB

2334

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 4, NO. 5, SEPTEMBER 2005

ˆ starting from the equalized sample block Y , both in the FD. Fig. 5. Transformations to obtain the regenerated data block S,

in the first equalization iteration of the interferencek block. ˆ (k,0,i) the vector of the detected data at Let us denote with S the ith PIC iteration. For i = 1, (13) still holds true, while for i > 1, the first iteration of the DFE yields

between the transmitted data and the detected data at the previous iteration [13]

Yp(k,1) = Xp(k) Cp(k,1) + Bp(k,1) Sˆp(k,0,i)

Now, the correlation depends on the channel and on the noise level, and it must be estimated at the receiver. Here, we summarize a method where an estimate of r(k,l) is obtained by X (k) and the FD detected signal at the previous iteration ˆ (k,l−1) S

(16)

where p = 0, 1, . . . , P − 1. In other words, within a PIC scheme, we may have an interferencek block using a DFE with ν = 1.

  r(k,l) = E d(k) (n)dˆ(k,l−1)∗ (n) .

ˆr(k,l) = η IV. D ESIGN M ETHODS For the design of DFE filters, the optimum criterion is based on the MSE at symbol level. However, this approach requires to solve a linear system of equations with the inversion of a large matrix. Considering that this operation must be repeated at each equalization iteration, here we resort to a suboptimum approach with the equalizer operating at the chip level. The result is a much simpler solution in the FD. We consider the minimization of the MSE after the IDFT in Fig. 5, which at iteration l = 1, 2, . . . , ν, can be written in the FD as J (k,l)

1 = 2 P

P −1 

(17) By assuming that chips are independent identically distributed (i.i.d.) random variables, with zero mean, and statistically independent from the noise, the expectation in (17) with respect to data and noise signals yields the MSE

J (k,l)

p=0

Hp

(k) (k)

P −1 

Sˆp(k,l−1)∗

(20)

Bp(k,l) = 0.

(21)

p=0

The application of the gradient method to minimize (18), with respect to the FB filter coefficients, yields Bp(k,l) = −

 ˆr(k,l)  (k) (k,l) Hp Cp − γ (k,l) MS

(22)

p = 0, 1, . . . , P − 1, where

P −1  2 1   (k,l) 2   = 2 Cp  MW + Cp(k,l) Hp(k) − 1 MS P p=0

2      + Bp(k,l)  MS + 2Re Bp(k,l)∗ (Cp(k,l) Hp(k) − 1)r(k,l)

Xp

with η as a correction factor (η < 1) to reduce the DFE error propagation phenomena. Indeed, in this design method, since the reliability of the detected signal at the FB input is increasing with the number of iterations, the filters will be different at the various iterations. To derive the filters that minimize (18), we also impose the constraint that the FB filter removes pre- and post-cursors, but does not remove the desired component, i.e., it must be

 2    E Cp(k,l) Xp(k) + Bp(k,l) Sˆp(k,l−1) − Sp(k)  .

p=0

P −1 

(19)

γ (k,l) =

P −1 

Hp(k) Cp(k,l)

(23)

p=0

(18)

and for the FF filter coefficients (k)∗

where MW = P N0 is the noise power in the FD, MS is the power of each element of S (k) , and r(k,l) is the correlation

Cp(k,l) =



Hp

MW + MS 1 −

|ˆr(k,l) |

2

M2S

  .  (k) 2 Hp 

(24)

TOMASIN AND BENVENUTO: FREQUENCY-DOMAIN INTERFERENCE CANCELLATION AND NONLINEAR EQUALIZATION

According to the discussion at the end of Section III, special attention should be paid to the first iteration of the equalizer (l = 1), which yields different solutions, according to the adopted IC scheme. • FD–SIC. At each SIC stage (i.e., user k detection), for l = 1, no tentative decision is available; hence, ˆr(k,1) = 0, B (k,1) = 0 and the FF filter is a linear minimum MSE equalizer. • FD–PIC. At the first PIC iteration (i = 1), no tentative decision is available at the first DFE iteration; hence, ˆr(k,1) = 0 and B (k,1) = 0, for k = 0, 1, . . . , K − 1. At the next PIC iterations (i > 1), a tentative decision of the previous PIC iteration is available also for l = 1 and a DFE can be used for each user. A. Rake-Based FD Equalization For comparison purposes, we consider a simpler solution for the equalizer, where the DFE is substituted by a filter matched to the channel [see (8)]. In this case, within the interferencek block, only one equalization iteration is performed, and in (14) and (16), FB is never active, i.e., Cp(k,1) = Hp∗ ,

p = 0, 1, . . . , P − 1.

(25)

This choice of filter provides receivers similar to the timedomain PIC and SIC schemes, where a matched filter (rake receiver) is used as an equalizer. However, compared to the traditional time-domain rake, the block-extended transmission yields a loss of the signal to noise ratio at the channel output (i.e., Eb /N0 ) of M/(M + L). In the following, we denote the IC schemes implemented in the FD based on the matched filter as RAKE–FD–PIC and RAKE–FD–SIC. When a DFE is used, the resulting structures are denoted as DFE–FD–PIC and DFE–FD–SIC. V. C OMPUTATIONAL C OMPLEXITY A significant advantage of the FD–IC is its reduced complexity when compared to existing IC structures. The efficiency of the architecture is due to the use of DFTs for the implementation of filters. Here, we evaluate the complexity of the FD–IC in terms of the required number of digital signal processor (DSP) operations per output data symbol. Similar to [12], we consider a virtual processor that performs one real floatingpoint operation per cycle. Complex operations require more elementary operations (ops), i.e., for complex addition and multiplication, two and four ops are needed, respectively, while for a cycle of a complex multiplication and addition four ops are required. Let us consider a receiver with a rake having Nr fingers. Then, the complexity per output data symbol of a bank of K rake receivers, implemented in the time domain, is given by Orake = KNr (4NS + 4).

(26)

2335

For the FD–IC front-end receiver, a DFT of size P is required with a complexity   4 P log2 (P ) (27) Ofe = M 2 N S

where we have divided by (M/NS ) since in one transmitted block there are (M/NS ) data symbols. The FF and FB filters of the DFEk block require ODFE = 8P operations while the remaining cascade of Fig. 3 requires two DFTs and two SP operations, hence OIDFT/DS/DET/SP/DFT = 4P log2 (P ) + 8P. The overall complexity per user symbol of the interferencek block, comprising the multiplication for the channel coefficients, is then Ointk (ν) =

1 ν(ODFE + OIDFT/DS/DET/SP/DFT ) + 4P

M NS

(28)

where we recall that ν is the number of equalization iterations. For FD–SIC, the addition of X (k) with the output of the interferencek block is needed to derive X (k+1) , thus adding a complexity 2P/(M/NS ). The overall complexity is 2P OFD−SIC = Ofe + (K − 1) Ointk (ν) + M . (29) NS

Similarly, for FD–PIC with I PIC iterations, the overall complexity is 2(2K + 1)P OFD−PIC = Ofe + I (K − 1)Ointk (ν) + . M NS

(30) For comparison, we also report the complexity of the architectures considered in [12]. The narrowband PIC (NB-PIC) [28] has a complexity per user symbol of ONB-PIC = 4KNr I [(K − 1)Nq + 1]

(31)

while the wideband PIC (WB-PIC) [12], instead, which operates at the symbol rate, has a complexity of     1 OWB-PIC = 4K NS Nq + Nr + 1 + + Nr . (32) 2K Finally, the reduced PIC (R-PIC) scheme proposed in [12], ¯ which limits the number of canceled interference terms to K, has a complexity of

¯ + Nr ) OR-PIC = 10K(K − 1)Nr Nq + I 4K(K × K(K − 1)Nr Nq log2 [(K − 1)Nr Nq ] .

(33)

Some numerical results for various SIC and PIC strategies are reported for a configuration encompassing an SP factor of

2336

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 4, NO. 5, SEPTEMBER 2005

Fig. 6. Complexity of various PIC and SIC techniques in terms of the number of operations versus the number of active users.

NS = 16, a channel with maximum length L = 16, a full rake receiver (Nr = 16), and a variable number of active users. For the schemes, the length of the DFTs is P = 256 and for the ¯ R-PIC K/((K − 1)Nr Nq ) = 0.125, which has been shown to ensure good performance at a low complexity [12]. The number of PIC iterations is I = 3 while for each PIC the number of equalization iterations is ν = 1. For FD–SIC, the number of equalization iterations is ν = 4. Fig. 6 shows the number of operations as a function of the number of active users for various implementation schemes. NB-PIC has the highest computational complexity: in fact, its number of operations grows quadratically with the number of users. WB-PIC and R-PIC reduce significantly the number of operations, being still higher than that of the FD–IC scheme. This yields a complexity only about three times that of the rake receiver, thanks to the FD implementation of the filters. Lastly, we observe that FD–SIC shows a slightly lower complexity than FD–PIC because it requires a lower number of MAI cancellations. VI. P ERFORMANCE R ESULTS For an uplink broadband communication system, in this section we show some performance comparisons between FD–IC schemes with RAKE and DFE blocks. In the considered scenario, the transmitted user signals undergo independent dispersive Rayleigh fading channels with an exponential power delay profile. We assume that the channel does not change at least for the duration of one PN extended block and that perfect channel estimation is available at the receiver. The root mean square delay spread is rmsds = 2Tm , where Tm = T M/(NS P ) is the chip period of the transmitted PN extended data block. Symbols belong to a binary phaseshift keying (BPSK) constellation and SP is performed with Walsh–Hadamard sequences having an SP factor NS = 16. For all schemes, the length of the DFTs is P = 256 and the length of the PN sequence is L = 16, while the correction

Fig. 7. Average BER versus number of users K for RAKE–FD–PIC (dashed lines) and DFE–FD–PIC (continuous lines) with the number of PIC iterations ranging from 1 to 4 and average Eb /N0 = 16 dB. No chip interleaver is used.

factor of the DFE design in (20) is η = 0.7. Moreover, we recall that for the DFE–FD–PIC scheme equalization is performed with ν = 1, while for the DFE–FD–SIC scheme ν = 4 and users are ordered with decreasing powers for detection. In any case, no channel coding is used. Fig. 7 shows the BER of a transmission with no chip INT as a function of the number of active users K for the RAKE–FD–PIC and the DFE–FD–PIC. The average performance of a single user on 500 channel realizations with an average Eb /N0 of 16 dB is plotted for a number of PIC iterations ranging from 1 to 4. From Fig. 7, for the DFE–FD–PIC, we observe that the second PIC iteration yields a reduction of the BER, while further PIC iterations provide no significant gain. Indeed, we note that DFE–FD–PIC outperforms RAKE–FD–PIC only in the case of one active user. However, when MAI is present, RAKE–FD–PIC outperforms DFE– FD–PIC because of the error propagation phenomena in the block DFE. This is evident in the figure where at the first PIC iteration of the DFE–FD–PIC no MAI has yet been canceled and the DFE error propagation is significant. Nevertheless, as shown in Fig. 8, when INT is present, the DFE error propagation phenomenon is significantly alleviated and the equalizer is effective in reducing the ISI generated by the dispersive channel. In this case, DFE–FD–PIC outperforms RAKE–FD–PIC by more than one order of magnitude in BER. Note also that the PIC iterations beyond the second one give a contribution to the reduction of BER, especially when a large number of users are active. We conclude that INT yields a significant advantage in terms of performance with a negligible impact on complexity. With regard to Fig. 8, it is our understanding that for I = 4, the slightly better performance of the case K = 13 with respect to the single user case (K = 1) is due to the choice of the correcting factor η and to the error propagation phenomena in the equalization process. In Fig. 9, we compare the performance of RAKE–FD–SIC and DFE–FD–SIC receivers. A range of equalization iterations from 1 to 4 is considered for the DFE and chip INT is included.

TOMASIN AND BENVENUTO: FREQUENCY-DOMAIN INTERFERENCE CANCELLATION AND NONLINEAR EQUALIZATION

Fig. 8. Average BER versus number of users K for RAKE–FD–PIC (dashed lines) and DFE–FD–PIC (continuous lines) with the number of PIC iterations ranging from 1 to 4 and average Eb /N0 = 16 dB. Chip INT is used.

2337

Fig. 10. Average BER versus average Eb /N0 for RAKE–FD–PIC (dashed lines) and DFE–FD–PIC (continuous lines) with the number of PIC iterations ranging from 1 to 4 in a fully loaded system (K = 16). Chip INT is used.

Fig. 9. Average BER versus number of users K for RAKE–FD–SIC (dashed line) and DFE–FD–SIC (continuous lines) with the number of equalization iterations ranging from 1 to 4 and average Eb /N0 = 16 dB. Chip INT is used.

The equalization of ISI provided by the DFE yields a significant performance improvement over the RAKE–FD–SIC receiver. However, equalization iterations beyond the second provide a negligible gain. This can be explained by observing that while in DFE–FD–PIC each iteration involves both IC and equalization very closely, in DFE–FD–SIC the two processes are more loosely related and in particular each equalization iteration reduces only ISI. As a result, DFE–FD–PIC is more efficient in the reduction of MAI than DFE–FD–SIC, as confirmed by the comparison of Figs. 8 and 9. Fig. 10 shows the average BER as a function of the average Eb /N0 of the RAKE–FD–PIC and the DFE–FD–PIC for a full loaded system (K = 16). From the figure, we observe that DFE–FD–PIC, after three PIC iterations, does not exhibit error floor due to MAI. As we observed in Section II-A, the PN extension requires the quasi-synchronous transmission of all users, because all user channel impulse responses must fall within the PN extension slot. On the other hand, in the absence of the appended

Fig. 11. Average BER versus number of users K for RAKE–FD–PIC (dashed lines) and DFE–FD–PIC (continuous lines) with the number of PIC iterations ranging from 1 to 4 and average Eb /N0 = 20 dB when no PN extension is applied to each chip block. Chip INT is used.

PN sequence to each chip block, interference among blocks arises and (7) does not hold anymore. Anyway, by using the above structures, Fig. 11 shows the average BER as a function of the number of active users when the information data blocks are not PN extended. An average Eb /N0 of 20 dB is considered. By comparison with Fig. 8, we observe a reduction of performance of about one order of magnitude in BER for the DFE–FD–PIC. Moreover, its performance turns out to be almost independent of the number of active users because it is mostly limited by the interblock interference. VII. C ONCLUSION In this paper, new IC and equalization schemes for a wideband CDMA system have been proposed. The performance of the proposed schemes has been evaluated in an uplink

2338

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 4, NO. 5, SEPTEMBER 2005

broadband transmission. The main advantages of the proposed structures are the significant reduction of BER and implementation complexity over existing IC techniques. These results are achieved by a special transmission format and the use of a block DFE in the FD at the receiver. A PPENDIX T HE E FFECT OF INT IN THE FD We assume that in a block of P symbols, the user k detector, at iteration l of the equalizer, introduces one error, say in the first position. We indicate this error with (k) = dˆ(k,l) (1) − d(k) (1).

(34)

To simplify the notation, hereafter we omit both index k and l. If c = [c(0), c(1), . . . , c(NS − 1)] is the SP code, in the absence of chip INT, after SP, the symbol error  will generate a chip error vector c. In particular, the error block of P chips is given by ˆ − s = [c, 0, . . . , 0]. e=s

(35)

Noting that e can be obtained by zero padding of c, by indicating with C the NS size DFT of c, the P size DFT of e can be written as an interpolation of C with the filter 

Ns −1 e−j2πn 2P sin πn NPS 

φ(n) = . (36) NS sin π Pn In fact

Fig. 12. cdf of the normalized power ζ with (continuous line) and without (dashed line) INT. Walsh–Hadamard codes with NS = 16.

over all subcarriers and in the entire set of the Walsh–Hadamard codes with length NS = 16. Noting that E[ζp ] = 1, we observe that in the presence of INT, the sequence {Ep }, p = 0, 1, . . . , P − 1, exhibits a reduced dispersion around the mean value; hence, the error energy is more uniformly distributed among the various subcarriers of the system. This yields an increased resistance against the error propagation phenomena in block DFE. R EFERENCES

Ep =

P −1 

em e−

j2πmp P

m=0

=P

N S −1 q=0

  P Cq φ p − q NS

(37)

where p = 0, 1, . . . , P − 1. Instead, when INT is present, the error block of NS chips becomes ˆ−s e =s =  [c(0), 0, . . . , 0, c(1), 0, . . . , 0, . . . , c(NS − 1), 0, . . . , 0] . (38) In this case, e is obtained by interpolation of c. The corresponding P size DFT of e is the repetition of C P/NS times, i.e., E = [C, . . . , C].

(39)

To compare the error energy in the FD in the presence/ absence of INT, in Fig. 12, using (37) and (39), we evaluate the cumulative distribution function (cdf) of the variable ζp =

|Ep |2 , 2

p = 0, 1, . . . , P − 1

(40)

[1] H. Holma and A. Toskala, WCDMA for UMTS: Radio Access for Third Generation Mobile Communications, 2nd ed. New York: Wiley, 2000. [2] J. G. Proakis, Digital Communications. New York: McGraw-Hill, 1994. [3] S. Verdú, Multiuser Detection. Cambridge, U.K.: Cambridge Univ. Press, 1998. [4] A. Klein, “Data detection algorithms specifically designed for the downlink of CDMA mobile radio systems,” in Proc. Vehicular Technology Conf. (VTC), Phoenix, AZ, 1997, pp. 203–207. [5] I. Ghauri and D. T. M. Slock, “Linear receivers for the DS-CDMA downlink exploiting orthogonality of spreading sequences,” in Proc. 32nd Asilomar Conf. Signals, Systems and Computers, Pacific Grove, CA, 1998, pp. 650–654. [6] X. Wang and H. V. Poor, “Iterative (Turbo) soft interference cancellation and decoding for coded CDMA,” IEEE Trans. Commun., vol. 47, no. 7, pp. 1046–1061, Jul. 1999. [7] S. R. Chaudry and A. H. Sheikh, “Performance of a dual-rate DSCDMA-DFE in a overlaid cellular system,” IEEE Trans. Veh. Technol., vol. 48, no. 3, pp. 683–695, May 1999. [8] M. Abdulrahman, A. U. H. Sheikh, and D. D. Falconer, “Decision feedback equalization for CDMA in indoor wireless communications,” IEEE J. Sel. Areas Commun., vol. 12, no. 4, pp. 698–706, May 1994. [9] L.-M. Chen and B.-S. Chen, “A robust adaptive DFE receiver for DS-CDMA systems under multipath fading channels,” IEEE Trans. Commun., vol. 49, no. 7, pp. 1523–1532, Jul. 2001. [10] Y. Ma and T. J. Lim, “Linear and nonlinear chip-rate minimum meansquare multiuser CDMA detection,” IEEE Trans. Commun., vol. 49, no. 3, pp. 530–542, Mar. 2001. [11] J. F. Rossler, L. H.-J. Lampe, W. H. Gerstacker, and J. B. Huber, “Decision-feedback equalization for CDMA downlink,” in Proc. Vehicular Technology Conf. (VTC), Birmingham, AL, 2002, vol. 2, pp. 816–820. [12] A. Nahler, R. Irmer, and G. Fettweis, “Reduced and differential parallel interference cancellation for CDMA systems,” IEEE J. Sel. Areas Commun., vol. 20, no. 2, pp. 237–247, Feb. 2002. [13] A. M. Chan and G. W. Wornell, “A class of block-iterative equalizers for intersymbol interference channels: Fixed channel results,” IEEE Trans. Commun., vol. 49, no. 11, pp. 1966–1976, Nov. 2001.

TOMASIN AND BENVENUTO: FREQUENCY-DOMAIN INTERFERENCE CANCELLATION AND NONLINEAR EQUALIZATION

[14] S. Beheshti, S. H. Isabelle, and G. W. Wornell, “Joint intersymbol and multiple-access interference suppression algorithms for CDMA systems,” Eur. Trans. Telecommun., vol. 9, no. 5, pp. 403–418, Sep./Oct. 1998. [15] N. Benvenuto and S. Tomasin, “On the comparison between OFDM and single carrier modulation with a DFE using a frequency domain feedforward filter,” IEEE Trans. Commun., vol. 50, no. 6, pp. 947–955, Jun. 2002. [16] T. Walzman and M. Schwartz, “Automatic equalization using the discrete frequency domain,” IEEE Trans. Inf. Theory, vol. IT-19, no. 1, pp. 59–68, Jan. 1973. [17] D. Falconer, S. L. Ariyavisitakul, A. Bnyamin-Seeyar, and B. Eidson, “Frequency domain equalization for single-carrier Broadband wireless systems,” IEEE Commun. Mag., vol. 40, no. 4, pp. 58–66, Apr. 2002. [18] K. Yang, A. S. Madhkumar, and F. Chin, “Novel two-dimensional multistage interference cancellation scheme for uplink transmission of single carrier cyclic prefix-assisted CDMA system,” Inst. Elect. Eng. Proc., Commun., vol. 150, no. 4, pp. 287–292, Aug. 2003. [19] A. Scaglione, G. B. Giannakis, and S. Barbarossa, “Redundant filterbank precoders and equalizers—Part I: Unification and optimal designs, and Part II: Blind channel estimation, synchronization and direct equalization,” IEEE Trans. Signal Process., vol. 47, no. 7, pp. 1988–2022, Jul. 1999. [20] M. Huemer, L. Reindl, A. Springer, and R. Weigel, “Frequency domain equalization of linear polyphase channels,” in Proc. Vehicular Technology Conf. (VTC), Tokyo, Japan, May 2000, vol. 3, pp. 1698–1700. [21] L. Deneire, B. Gyselinckx, and M. Engels, “Training sequence versus cyclic prefix—A new look on single carrier communication,” IEEE Commun. Lett., vol. 5, no. 7, pp. 292–294, Jul. 2001. [22] A. L. C. Hui and K. B. Letaief, “Successive interference cancellation for multiuser asynchronous DS/CDMA detectors in multipath fading links,” IEEE Trans. Commun., vol. 46, no. 3, pp. 384–391, Mar. 1998. [23] R. M. Buehrer, “Equal BER performance in linear successive interference cancellation for CDMA systems,” IEEE Trans. Commun., vol. 49, no. 7, pp. 1250–1258, Jul. 2001. [24] J. Weng, G. Xue, T. Le-Ngoc, and S. Tahar, “Multistage interference cancellation with diversity reception for asynchronous QPSK DS/CDMA systems over multipath fading channels,” IEEE J. Sel. Areas Commun., vol. 17, no. 12, pp. 2162–2179, Dec. 1999. [25] D. Divsalar, M. K. Simon, and D. Raphaeli, “Improved parallel interference cancellation for CDMA,” IEEE Trans. Commun., vol. 46, no. 2, pp. 258–268, Feb. 1998. [26] R. M. Buehrer, N. S. Correal-Mendoza, and B. D. Woerner, “A simulation comparison of multiuser receivers for cellular CDMA,” IEEE Trans. Veh. Technol., vol. 49, no. 4, pp. 1065–1085, Jul. 2000.

2339

[27] H. Elders-Boll, H. D. Schotten, and A. Busboom, “Efficient implementation of linear multiuser detectors for asynchronous CDMA systems by linear interference cancellation,” Eur. Trans. Telecommun. (ETT), vol. 9, no. 5, pp. 427–437, Sep. 1998. [28] N. Correal, M. Buehrer, and B. Woerner, “A DSP-based DS-CDMA multiuser receiver employing partial parallel interference cancellation,” IEEE J. Sel. Areas Commun., vol. 17, no. 4, pp. 613–630, Apr. 1999.

Stefano Tomasin (S’00–M’02) received the Laurea and Ph.D. degrees in telecommunications engineering from the University of Padova, Padova, Italy, in 1999 and 2002, respectively. During the Ph.D. course, he was an Intern at IBM Research Labs in Zurich, Switzerland, and at Philips Research, Eindhoven, The Netherlands. Since November 2003, he has been a Member of the Faculty of the University of Padova. His research interests are on signal processing for digital communications, coding, and multiuser access techniques.

Nevio Benvenuto (S’81–M’82–SM’88) received the Laurea degree in electrical engineering from the University of Padova, Italy, in 1976, and the Ph.D. degree in electrical engineering from the University of Massachusetts, Amherst, in 1983. From 1983 to 1985, he was with AT&T Bell Laboratories, Holmdel, NJ, working on signal analysis problems. He spent the next 3 years alternating between the University of Padova, where he worked on communication systems research, and Bell Laboratories, as a Visiting Professor. From 1987 to 1990, he was a Member of the Faculty at the University of Ancona, Italy. From 1994 to 1995, he was a Member of the Faculty at the University of L’Aquila, Italy. Currently, he is a Professor at the Electrical Engineering Department, University of Padova. His research interests are in the areas of voice and data communications, digital radio, and signal processing.